DOCUMENT RESUME 



ED 218 309 > 

AUTHOR 
TITLE 

IHSTITUTION 
SPONS AGENCY 
PUB DATE 
GRANT 
NOTE . 

EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



TM 820 349 



Goldstein, Harvey; Ecob, Russell 

An Investigation of Models fcr the Estimation of Test 

Score Reliability Using Longitudinal Data. 

London Univ. (England). Inst, of Education. 

National Inst, of Education (ED) , Washington, DC. 

[Oct 81] 

NIE-G-77-0065 

143p. 

MF01/PC06 Plus Postage. 

Analysis of Variance; Elementary Secondary Education; 
*Error of Measurement; Longitudinal Studies; 
Mathematical Models; Path Analysis; *Scores; *Test 
Reliability; *Test Theory 

♦Instrumental Variable Methods; Quantitative Data; 
♦Structural Equation Models 



ABSTRACT \ 

Using data from a National Child Development Study 
(NCDS) in Great Britain, the applications of instrumental variable 
methods and structural equation models to estimating instrumental 
variables are presented: A subset of the longitudinal educational and 
home background data on children born in England, Wales and Scotland 
in a March week of 1958 is used. The models for quantitative 
variables are discussed in terms of reliability in measurement error 
variance, relationships and models for true score measures on two or 
more occasions, and identification problems in measurement error. A 
correction for unreliability or measurement error variance of the 
independent variable, instrumental variable estimation, and 
estimation in structural equation models are discussed. The methods 
used with the NCDS data are! summarized. The appendices include the 
research project submission, a description of the NCSD, and a 
comparison of the classical tetv score model and the latent variable 
model. Further appendices discuss the distributions of variables, 
transformations, instrumental variables theory and test score 
reliability, LISREL applications, estimate inconsistency, missing 
data and a social class variable. (CM) 



*************************************************** *************** **** 

* Reproductions supplied by EDRS are the best that can be made 

* from the original document. 

************************************ ********************************** 



UA DEPARTMENT OF EDUCATION 
NATIONAL INSTITUTE OF EDUCATION 
EDUCATIONAL RESOURCES INFORMATION 
CENTER (ERIC) 
Ja. This document has been reproduced as. 
receded from the person or prganflahon 
originating n 

Mmor chv»»g*»s have befn made to Knprovt 
rcproducion quaHty 

• Pc«ms of view or opinions stated in this docu 
^ ment do not necessarrfy represent official NlE 

position or poitcv 



An Inve st? nation of Models for Mie Kst i p a i i on o f 

Report of Research ^_ro iect Support.cdbv the 
fflE under Grant No NT- - 0 - 7 7 - 0 0G5 

By 

Harvey Goldstein 
and 

Russell Ecob 



University of London Institute of rducation 
Bedford Way, London KC11I OAI. 



An Inyc sjtjj'^m i on o f Model s jk> rjj io K st iin .it ion of TcM S mrc Reliability 
ur. i nfl 1 .< i 1 1 id i n a 1_ c U t a > Re ptn t^of Kcse ar < h Pi o j i ■ el suppor t i 1 d Ivy the* 



NIK under Grant No NIK - <; - 77^- Oj 




; 

CONnCNTS 

Prof ace , « 

Chapter 1 Model s for Quant i t /it i ve V a r i ah 1 es^ 

1.1 Models of measurement error 

1.2 Relationships between true scores on two occasions 

1.3 Models for true scores measured on more than two occasions 
1 . h Measurement error: identification problems 

Chapter 2 Metho ds of Estima ' 1 on 

1 5 2.1 Ccrrection for unreliability or measurement error variance 
of the independent variable * , 

2.2 Instrumental variable ejtimaf'on # 
2*3 A unified approach to estimation in the just-identified case 

2.4 Estimation in structual equation models * 

Chapt er 3 Applications of Instrumental Variable Methods and Structual 4 

Equa tion Models to MCDS Data ' > 

Chapter k Measurement Error in Ca t egorised V ariabl e s 
Chapter 5 Eur the r .Research ' 

Refcrehces 

Appendix 1 The research s uhmi s s i on for the p r o j e c t 

A ppendi x 2 The National Child Development Study (NCOS) 

Appendix 3 Tl l c CI a s :» i c a 1 list score mod e 1 and the latent \ a r i a 1 > 1 e mod e 1 comp a red « 

Appendix 4 The d i s t ri l>u t i ons o f va ri ah 1 es an d t ran s f o rma t i orfs 

^Appendix 5a Theory of Instrument;)! variables estimation ~~ * 



A.PJ H 'V,!l^ Instrumental variable method's for the estimation of le:.l scon 

^ rcU ability 

£P JL^Ui! } 2LA Ap plications of 1,1 S HE T, to e d u ca u una 1 data 

JC -i- 3 



* * • 

Ay^on tii?: 7_. \\i>t irul i »f\ I ho i ncon.si *;t en* y of in.itriuiirot ;il v.ui.ililcr. 

estimates in (lie- cisc of confeher i c v.'ii i.iMcs villi 
coru'I.n cii errors 

1 Aj> i > c * «l5Lf_2? Missing data in the lit'PS: the uso ami evaluation oi a 

method of Bealc and Li tile for estimating missinp, values 
Appondi x 9. ttepvssion of «'>Liain;ncnt on .social mix viihin social class 

'-^effect of error;; hi social clars cl assi f ii at ion 
0 Appen dix 10. A description of two d.ita.sots u*u J to estimate measurement 

error ir social clan;:. 1 



$ x 



er|c 



\ 

\ • 4 



j* re fa co 

Ay inrii rated in the grunt submission for this piojert (Appmdix I) whilst there 
has recently developed a sizeable liloialiuv ojp slalislio.il model building for 

. longitudinal data, there have boon few aHempi!. lo study l ho uppj ioabi 1 i ly 
of these models to real dula, 'Iwo dove lopmeul s , Jargoiy since Lin* submission, 
have however influenced the course t»f the research report ed' here . Klrslly, 

% the grant bolder has completed hi s* mot bodo] ogi < al research under ,<F1 Cranl 
No. 400-75-004 I. This covered part of the ground und^r part (a) of the. 
submission (see Goldstein, \[)7[)) and drew al lent ion to i] 10 need to compare 
instrumental variable estimators using different choices of instrumcnlal 
variables with regard to their consistency. Secondly, more progress has been 
made in the. use of Structural Equation models in longitudinal data, particularly 
by Karl Joreskog and his co-workers in a project on "Statistical method? for 
the analysis of longitudinal data", and the computing difficulties have been 
substantially alleviated with i he availability of the LTSREI. program (Joreskog 
and Sorbom, .19 78), now into its 5th version. This suggested the use of these 
procedures on the NCDS data. 

As a result of the first development, a substantial* part of this project lias 
been devoted to the examination of the method of instrumental variables 
estimation. Estimates obtained using different variables as v instrumental 
variables are compared in the light of theoretically derived hypotheses about 
their relative 'values in the regression of 16. year on 11 year scores,- separately 
for tests of reading and mathematics (see Appendices 5a, 5b). 

Structural Equation models were applied in an exploratory sense to the regression* 
of 11 year on 7 year reading attainment and in a confirmatory sense 'to the 
relationship between reading attainments over the three ages 7, II and 16, and the 
parameter estimates comparer* villi those obtained using the i nsl 1'umonJ.al 
^ variables method (see Appendix 6). In addition, a ronanalysis is given of 
an application of Structural Equations models to reliability estimation on 
longitudinal data shewing the dependence of the estimates on the particular 
final modelsMised (see Appendix 6), Expressions for the inconsistency of 
Instrumental Variables estimates in terms of the correlations of the errors 
of measurement of the variable!- involved are given and Structural Kquations 
methods 'are' used to obtain estimates of these correlation:, (see Appendix "0. 

Preliminaiy to any anal>sis of such a dalasol as the National Unld Development 
Study (described in Appendix 2) it is notessaiy to-check on the distributional 
characteristic!! of the n levant variables and if ivoessarv to caiiy out 
transforms ions. A Discussion is giver, (see Appendix h) of possible 
l rans formal i ons of variables and-the conditions foj their use. Kere | he 



ERJC 



u 1 i 



possible conflict between transf oinut ions which give linearity of i el.it ionships 
and those which giyo marginal "1101 m.il i ly is W.ami mil w1h-i the il.it j do not' 
posses; the property of mult i v.iri .it o normality, fn addition when in.my vaiiablrs 
are used simul \ .moo us] y in ;in .111.1 lysj 5;, .such as when using a mnmV'r of 
instrument c' 1 variables, bad. ground variables or multiple indie aioi s ,of .1 
lament variable, Lhe problem of partial non-i esponsf is high) »giii ed where 
one or more of Hie relevant variables are missing \or £i particular case. 
A meLhod of interpolation of partialjuui-respon.se due to iJ0.1l p and Little 
(19 76) is examined on the NCOS data (see Appendix ft) and a comparison is made „ 
of estimates obtained by this method which uses the information from partial 
non-respondent. s ; wi fli estimates obtained when all such cases are deleted. 

Some .thought has been given to the analysis of models of men bun. men l error in 
categorical data in particular for obtaining measures of change in true social 
class between two ages by correcting the observed social mobility matrix for 
error in social class as suggested in the discussion in Goldstein (1979a), 
This work is not reported here since we have been unable as yet to rolve the 
computing problems involved in obtaining reasonable estimates of the 
conditional probabilities relating the true social class probabilities at different 
, ages. However, estimates of measurement eri\»r of social class obtained from 
three data sets (see Appendix 10) were used in a model of the regression of 
attainment on a measure of social class mix correcting this measure for error 
in social class (see Appendix 9), This iiioofel was tested on data from a 
large literacy survey and the correction for measurement ^rror was shown to 
alter the concisions substantially. Other points -in the research submission 
not examined in detail are the study of log-linear models and of scoring methods 
for categorical data. On the log-linear model s, ini t ial work failed to give 
promising results. This is not to deny the potential of these methods and the 
comparisons suggested in the research submission are still considered valuable. 
The scoring me thods for categorical data were, not examined in detail, though - 
an investigation was made of the effect \>n the parameters of the structural 
equation models of alternative seori ng met hods for the teacher ratings 
(see Appendix 6)» 

The report has been divided, for convenience, into five chapters and 

ten appendices, with individual rcsponsi\i 1 ity for the latter being indicated 

whe re app ropr i a ( e . 1 

m 

Ruf.se 1 1 Kcob 
Harvey Cold- f.-in 

October 1081 

6 



JC 



J VI 



r 



KEFTCftKNGTCS 



\ 



I. COLDSTKltf, IP 



2, JORKSkOG, K G & 
SORBOM, D 



C ' 979) Some Models for Analysing ' 

Longitudinal Data on 1' due at i onal 
Research. J. R. Slat j si ical Soc.142 
. 407-442 

(1978) LISRKL TV: Users Guide A 



7 



ACKNOWfj:i)a;M r.NTS 

Wc* wish to Lhank the following people who conti United to the Project : 4 ^ 
Ian Vlcwisj >SLevcn Simpson and Douj;al !lut<hii.ou who piovidcd useful comtw^Hs 
<*rom Lime to 'time. Shirley Preeman and Kay Pilpel who provided valuable 
help with' the typing. > 



1, Models for Measurement of miant i f alive variables 
^ : 

■I , 1 Model!? of measurement er ror 

We briefly describe classical test score theory and latent variable theory, 
give a definition of reliability and show its relation to measurement error 
variance. 



Let x. . be a measurement of individual i on a lest item j at occasion k, then 

1JK 

the classical lest theory model is that 



x. - T. . + u. 
ijk M i jk 



(1) 



where u. is the measurement error and T\ . is the true value or true score. 

The error is a random variable defined as having zero expectation over 

replications anjJ zero correlation with the true score. The variance of the 

measurement error is o * , t 

ujk 



Though sometimes described as a latent variable model this model is essentially 
different in that expectations are taken over replications within individuals 
as opposed to over individuals in the latent variable model. This latter model 
is contrasted with the classical test theory model in Appendix 3, where a * 
formal axiomatic basis for the two models are given and it is shown that an * ■ 
additional axiom is required for the latent variable model in order that it 
can be used in the classical test theory context, and traditional reliability 
estimates used. This extra axiom is that the covariance of errors* of any 
two individuals over /^plications is zero and is called the* axiom of * 

experimental independence between persons/ " Conditions under which this 

• • • ' v 

axiom may^not hold are given in Appendix 3 as well as references to the 

relaxation of this axiom. This axiom will be assumed to hold for the remainder 

of this report where latent variables and classical test theory are used 

interchangeably, Wew.ll u«;e the term true score in all contexts,- * b 



For a given occasion and item or test, x, the reliability (R) of x is given by 
(we omit the individual subset"? pts from now on) 



s J x 



(2) 



where o£ is the variance of true scores over individuals and thus 



1 X n 



(3) 



. For a test composed of many items an assumption of local i iidepciulem e between 
items is necessary for the use of both classical test* theory and latent variable 
theory. This is described also in Appendix 3 and it is Mun/n 

ERIC i.i 

™. . . < ' •• 9 J 



that It is roquircil in the formulae for the l r.uli t ion.il reliability is: imatcr. 
t which make use of the relations between the items in a tost. 

\ o^s. lal ionships be t ween L r u o s co r es on t wo o ocagj ons . * 

One possibility for modelling the relation bt-fween variables i:^,Lo 

standardise the distributions at each occasion thus assuming, a common scale. 

48?Whon this is done and the crude difference between the variables is 

considered as a measure of change wo have the uncondi tioual- model for rchangc 

. v 
over time. This model has defrci enecs when comparing changes between 2 or 

more subgroups of the population* (fomuH by dividing the population say in 

terms of sex of social class) as thc^ variation within each of the groups may 

not* then.be constant across occasions. In addi tion^ the -error variances on 

the two 6ccasiono are unknown and possibly unequal Cor different subgroups, 

so that the subgroup variances of the true scores on the separate occasions 

are lint '.own. 

\ m 

0 m 

In the remainder of this"' report we-consider the conditional regression model 
.•for true scores, 

4 

r 2 « a + 0 Tj + e * (4) 

The rationale for choosing this, model is discussed in detail by Colds Loin 

X J 979) . ^ Briefly, inferences dr;*wn from this model are robust against 

/ °non-*linear scajle transformations andf the model implicitly incorporates 

<^ tho time asymetry present in thes^eal *worl d. Neither of these properties is 

shared by the sample change model. 
' * V> * * ' 

' - w s t 

*l • 3 • >lod el s for true scores measured o\\ npre* than two occasions 

We consider measurement at three separate occasions and the generalisation 
N to more than three is straightforward. Linear regression equations relating 
the true scores for' a given attainment over the three occasions can be 
, writ tan as 

T 2 = a i + <v r i + e i 

T 3 = a 2 + Vl + X? T 2 "* °2 .<*> • 

A system of equations of this form is called a recursive system as for each 
new equation in the system a variable is introduced which is not present in 
any previous equation in the system. The identification of this system, or 
the existence of unique parameter estimates isdiseussod by Johns (dh (107?., 
p 365), It requires in particular that the correlation between errors 



ERJC „ 12 w 



on' 



c 

A zeru coirelation between thc\«;c errois is necessary in orucr that the Ordinary 
Least Squares (OI.S) estimates wl.cn applied Lo oa,ch equation separately gives 
efficient estimates? and for each equation a zero correlation between these 
errors and the independent variables is necessary for consistent estimates. 

Neither of. these conditions will hold if the equations are mis-specified, for 
instance* by the exclusion of a relevant variable, or for ex. mule by the exelusi 
of a quadratic or higher order term in ene of the independent variables in ihc 
equation. 

For the NC1& data bof.li for reading and mathematics attainment, almost linear 

relationships between observed* scores "at each pair of ages 7, li and 16 can 

be obtained by suitable transformations of test scores which also produce 

a ratio of maximum to minimum variance around the regression line not exceeding 

2.0. This is achieved for both attainments by empirical trans format ions which 

give standard norml distributions for the li and 16 year scores, the 7 year reading 

test value for reading being transformed to give a linear relationship witho 

the 11 year scores. The mathematics 7^year raw score is roughly normally * 

distributed and linearly relatec. to the 11 ye.«- score without transformation. 

The transformation of the 7 year reading scor^ is required because of the 'strong 

ceiling effect on this test (22% of the obseivations were In the top 2 out 

of 30 values) and though rendered more normal by this transformation, the 

skewness and kurjtosis remain high, -1.1 and 3.3 respectively. 

Thus mis-specification in the above models will he due only to omitted 
variables rather than non-linearities. One relevant omitted variable is 
Social Class and including this variable as a set of dummy variables (w.>. 
Equations (5) and (6) become J 



This mooYl still docs not accommodate changes in Social Class between the 
occasions ^nd this | is either accomplished by including the values at each 
occasion in equation (8) or by including the value at the 2nd occasion and 
the change between the 1st and 2nd occasion, the latter giving an easier 
interpretation as *»ell as more precise parameter estimates, since social 
class is highly Associated at occasions I and 2. 



ERIC 



The model can be further extended to the case where two dependent variables, 
for example mathematics and reading attainment, are related across occasions. 
Tins case has been considered in detail by Goldstein (197<)) and we shall 
pursue it fin ther. 

11 



1 . 4 Measurement error: Mortifica t ion Vvoh ferns 

Consider first the case of one independent variable ainl the irl.it ion?;hip 
between the t ri,e score:: given in equation (A), 

£e now add the measurement equation from classical test scoie theory (equation I) 

x, « T t 4 U| (9) 

x 2 = T 2 * l, 2 < 10 > 
where Var (Tj ) = q\ % Var (u.) - o^ 7 , and assame that under the classical tesi 
tlieory axioms the covariances between the measurement errors -u |f u 2 and betvecn 
the measurement errors, v ,,u 2 and the disturbance term in (/.) , ,arc al I zero. 
If we assume in addition that the variables T ]f T 2 and the errors iij.'u^c 
are normally distributed then all the distributional information is contained 
in the first two moments as the observed variables are also stomal ly distributor 



We have 



K<x,) bji say 

Erx 2 ) = - > 

Var = a 7 + o 7 ^ 

Var (x 0 ) = 6V +o * 
2 v 

COV (X, X^) = p 0 2 



where o ? »o 7 +a 2 
v u 9 e 

which expresses the 7 unknown parameters in terms of tl.„ 5 obre.-vcd moans, variances 

and covariances, Note, however that O y ? is sufficient for inferences about 3/ so 

that we will consider on?v the estimation of the six unknowns, 4,fl,|i*,o> o 2 

I u. * 

0 v Hies- equations may also b 5 e obtained from the likelihood function of 
*thc observations. By examination of these equations it can be seen tli.n once 6 
is determined, a and p | can be found by substitution. To solve the remaining 
equations we need a further restriction. One possible restriction is a ? ~ko ? 

11 1 



which corresponds to the use or a value R, of the reliability of X| 
viz, k * (I - R.)/R.. Other possible restrictions are 0 7 known, or the ratio 
°< Uj^v Unown ' 1,1(4 restriction o ? ^ = 0 will sometimes hold in 'ine physical 
sciences and is the case dealt with by Mndansky (I960). 

The equations (II) set may also be idcntilied by using extra infonn.it ion from 
^j # "'er variables. These mav be either replicate observations on x, 

ERIC ' 



of i other, correlated, variables known* as 'instrumental' variables. 



With replicate observations where their, measurement errors 

uncorrelated, this provides an estimate of o 2 and nenoe R 
leading to identification, 

c 

The instrumental variable, X, is assumed to have zer.o covnria 
with the errors ii 2> 11 ^ and e. We then have cov(x^X) ■ fl'cov(x v .) and 
we obtain an estimate of (5 which allows* the othci 3 parameters to "be 
determined "uniquely, 

may also ask what are the general conditions under which this model 
without extra information is not identified. It turn's dirt '(Riersol , 1950) 
that the only conditions under which identification does^riot hold is 
when.Tj^T^ are normally distributed or are constant; the lack of * 
identification resulting f*-om the absence of information about the 
parameters from moments higher than the second which are all zero in the 
normal distribution case. A generalisation of this condition co the 
multiple regression case is that the parameter vector B is identified if 
and only if there exists no linear combination of the vector T which is* 
normally distributed Uigner ,Kapteyn and Wansbeck, 198f), The simple model 
is identi-fie'S even when .T^ is normally distributed if neither the 
distributions of O^ 2 or o* v 2 have a normal distribution (Riersol, 1950), 

Where Uj, u^ and T have non-zero covariances then independent estimates 
of these values are required for identification. 



2 , Methods o f_ 1" s li m at i oj^ 

2 * I Co r re c t i < >i i . for un re 1J ;i h i ) i t y o r i;u» nsu rein on I c r r o r vari < i n c o o f th e 
independent vari ah 1 o 

l ; or Lho one independent* variable rase, we have fro™ (•':) 

x 2 = a + 6xj + (e - Bu + u 2 ) 

and iL can be seen that, due to the presence of the term u^ whose correlat 
with Xj is (1-K), the error term^is now negatively correlated with x^» 

0 

The ordinary least squares (OLS) estimator, B^^of B then has expectation 
in the limit as the sample si/.e tends to infi'nitv, 

■ B'R ' (12) 



a 2 
u 



a 2 + a ? 
x l 



and a consistent estimator Cor R is thus obtained by 

B OLS 'W / , < > 

*n finite samples of size n the expectation of ft* has been shown by 

Richardson and Wu (1970) to be E(B* ) -6(1+ 2RQ-R ) + 0( - 2 )K 

OLb „ _ 
n , n 

This, gives a bias of less than I in ]0,00O for the present data. 

In the NCDS data then, where the reliability (R) is not known precisely 

but an unbiased estinate of R, distributed independently of B rt _ is 
a ^ OLS, 

available, then B ' is almost unbiased. The measurement error of the 
reliability estimate itself will inflate the variance of B^ 7 c .and this 
can be taken into account (Fuller and Ilidiroglou (1978)). 

A number of similar methods are available for the multiple regression 
case to take into account known or estimated variances and covariances 
of measurement errors (Ilidiroglou, Puller, Hickman, 1979). 

Estimation procedures have been described for the iol lowing cases 

a) when the reliability of each independent variable is either known or 
> estimated, when the reliability of the dependent variable may or may 

not be known (Fuller and Ilidiroglou, 1978). 

b) for general E , E , a J being known or for which estimates are 

mi uw w 

available^ a J being unknown but positive (Fuller, 1980). 

c) when E, n ,, )- Mw only are known and where O 7 and are unknown but 
positive (Ilidiroglou, Fuller and Hickman, 1979) and 

d) for the above' ease whore E ' is zero, T, not he i nil assumed 

uw uu '* 

* * 



2 J 



14 



diagonal (Fuller, 1980), the special isalion to diagonal F |u beinj* 
f»iven by (Warren et a], 1974). 

All those results apply more generally f)r inulLiple dependent variables. 
The 'estimation me L hods arc all based on lensL squares and make no 
assumption about the distribution of the independent variables. 

i 

i ' 1 

2 . 2 Instrumental Variable estimatio n 

o 

A drawback to^hese methods is the need, to make the assumption that 

t' is known or in particular is zero or can be estimated. The Instrumental 

Variable methods which use extraneous information provide consistent 

estimates of g and Cov(ft) even when'E is unknown, provided the instrumental 

variable used in conjunction with a particular independent variable is ' 

• -s' 

uncorrelated with the error in both the independent and dependent variables • 
and also with the equation disturbance term. 

This method was used by Goldstein (1979) to correct the measurement error, 
of 7 year scores of reading and mathematics using teacher ratings at 
the same .age, since no good estimate of the reliability of the test 
was available. Ecob and Goldstein (1981) examined the suitability of 
instrumental variable estimation for est ; mation of change in reading and 
mathematics between the ages of II and 16 in the same study by comparing 
estimates using different instrumental variables and after having formulated 
hypotheses as to their likely values. This paper is reproduced as 
Appendix 5b, the theory" of the method being given in Appendix 5a. 

As the OLS estimator consistently estimates [3/R,the instrumental variable 

estimator will also give a consistent estimate of the reliability (U) of 

J estimator. 

the independent variables by dividing the OhS estimator by the instrumental variable, 
An tt expression for the asymptotic Variance-Covari ance estimate of the vector 
^>* 1 regress ion coefficients, measurement errors of independent variables 
v and disturbances is given in Kapteyn and Wansbeck (1978) which enables 
the standard errors of the reliability estimates to be obtained. 

^ • 3 A u nified approach to estimation in the just-identified c a s e 

Kapteyn and V.insbcck (1978) present an estimator for the multiple regression 

r 

situation of which includes the estimator in a) and b) above as special 4 
cases. ~~ — — . 1 



The consistent adjusted least squares (CAI.S) ostimaLor is of the form 



B - (X'X) (X'X-nff 'b 



01,S 



where C is the variance cuvariancc matrix of the errors, u in X and h 

O^S 

is the ordinary lea.si squares regression estimator. As C is not generally 
known an identifying restriction is made which is either exact, j n general 
F(B, a/, C ) « 0 or stochastic, F(p>, a ? , C () where X is an unkiiovn 

vector of random variables. 



2.4 Estimation in Structural equation models 

The two principal programs avaihble COS AN (McDonald, 1980) and LISftia 

(Joreskog and Sorbom, 1981) both now offer a variety of estimation methods 

(least squares, generalised least squares and maximum likelihood). (.See also Bentler 

& Weeks, 1980). . 

The option of generalised least squares estimation in LISREL V (Joreskog 

and Sorbom, 198!) allows the modelling of data which are not of multivariate 

normal form, the maximum likelihood estimates having unknown distributional 

properties. The program used in the application of structural equation 

^Helling in Appendix 6, LISRFL IV, uses maximum likelihood estimation 

methods and some investigation of che effect of the non normal distribution 

on the parameter estimates-is made . . ' 

1 



2.3 



16 

o 



^ * A Summ ary o f th e _r_o su Ij s o f Ins trumcnl a 1 J^ar jj c iS!l t i mn t i on and 
Struct ura 1 ci|» £i I i on mode 1 1 i n g ^!J?.L ] L 0 Ibo NCDS Ija t ii^ 

We here summarise ihe upp roach used and conclusions reached by Kcoh 

and Goldstein (1931) using instrumental variables. A number of possible 

variables were examined separately as possible choices of instrumental 

variables in the estimation of regression of attainment at 16 years on 

attainment at 1! years in reading and mathematics separately. These 

included teacher ratings at ages 7, 11 and 16 of a variety of attainments 

and skills and also the social class of the father when the child was at 

each of these ages. Then a number of 'hypotheses were set up, motivated 

by theoretical expectations regarding the relationship between particular 

instrumental variables and the errors of measurement in the independent 

and dependent variables separately and the disturbance term of the 

regression e.quation. These related for ifcp teacher ratings, to whether 

they measured the same or different attainment and whether they were 

measured on the same occasion as the independent or dependent variables. 

The results suggested that teacher ratings on the same attainment as that ^ 

tested when taken at the same time as the tests were positively correlated v 

with test score error, and that this correlation was lover when the x 

teacher rating was of a different attainment from that tested bat still 

persisted when the rating was taken at a different time from the test. 

However, teacher ratings were uncorrelated with the disturbance terms. 

In contrast, social class was correlated with disturbance terms though not 

with test score error. Whilst none o-f the instrumental variables exactly 

satisfied the conditions for a consistent estimate, the correlations with 

test score measurement errors of the teacher ratings worked in opposite 

directions for the dependent and independent variables. Excluding ratings 

taken at the same occasion as the dependent viriable and also social class 

of the regression coefficient 
gave est i mates /wi thin a reasonably narrow range. (0.94 to 0.99 for reading 

attainment and 0.84 and 0.92 for mathematics attainment) which was of the 

same order as the standard error (0*13 to 0.18). 

Hie estimated standard errors using suitable instrument al variables individually 

shown to be less than was obtained by the split half method used by Goldstein 

(1979) on 300 cases though not as low as obtainable by the split half 

method applied to the whole data. The reliabilities of reading and mathematics 

attainment were also examined separately in different social classes by 

this method and different values for estimates of both reliabilities and 

of the variance of measurement error were found. Jhesc allowed c:,. ' . .»t*'s if 

the true correlation bo.tween attainments at each age within social class 

Q 2 made. * 
UC 3.1 



The structural equation model 1 i nf* approach is npp'ied to the NCPS data 
in, Appendix 6 and we briefly summarise here the procedures used and 
, conclusions reached. Though not always clear cut there is a distinction 
to he made between exploratory analyses in which the model is systematically 
extended to involve larger numbers of parameter.*. ii\ order to provide a 
better fi t to the Id at a and confirmatory an. il yscs where rc^tri ctions are made 
to the model and aycepted according to tests of fit. 

The exploratory analyses were used in> an investigation of reading attainment 
at ages 7 and II. The change in reading attainment between these ages was 
examined using the conditional regression relation of equation CO. 
The analyses suggested that the addition of either of two extra indicators . 
had little effect on the parameter' estimates. 

Tlje relationship between reading attainment at the three ages 7, It and 16 
was then examined. A substantial improvement in fit was found by assuming 
a test specific factor for the reading tests at each age and this "\s 
found to load pat ticulnrly highly on the reading tests at II and 16 (the 
same reading test was used on these occasions). The addition of a test 
specific factor for teacher ratings further improved the fit of the model! 

The estimates of the structural relationship parameter wore compared with^the 
instrumental variable estimates and broad agreement was found. 



4. Measurement Kirov in categorised variables 

The eslimntion procedures for models' with qualitative variables subject to 
measurement error assume constancy of measurement error distributions, 
independent of true values. Where the measurement error is" in discrete 
or categorised variables, however, this distribution will not generally 
be independent of the true value or category. Thus, the probablity of 
misclassifying an observation will in general depend on the true underlying 
category to whicft it belongs. 

In Appendix 8, a simple model is proposed Tor analysing measurement errors 
m social class, assuming just two categories and known miscl assi f ication 
probabilities. The results show that quite large adjustments to model 
parameters are obtained when estimating true score coefficients and this 
suggests that there is a need systematically to develop methods for 

dealing with such data. . * 



rurth or Rese arch 

The project, has shown how a large longitudinal d.Ha set can be used- 
etnpiricalJy to provide estimates of measurement error variance. Two particular 
areas of further research have also been identified viz. 

1. ,'fhc extensioo^of structural equation models Co handle: 
partial information (i.e. sample estimates) about measurement 
error variabK-s. 

2. The development and the empirical testing of models 
for measurement error in categorical data. 

It is our view also, that the present project has demonstrated the need 
for empirically based ftata analysis to study the assumptions of the various 
models of measurement error. While we see a need for yet more theoretical 
development, there is a danger that this could outrun the ability of 
existing data to discrminiate between alternative model assumptions. In 
particular, the WCDS data and other similar large data sets should be 
exploited fully- in the development of new techniques. 



ERLC 



5. I 



20 



R EFKRKflCKS " ' ; • 

Moat references for instrumental variables estimation, structural equal ion 
modelling and "effects of social mix are contained in Llie relevant appendices * 
nnd will be omitted from this reference list. This list, however include* 
references for Appendices 2, 3, 4 f 8, 10. 



MCNKH, D J, KAPTEYN, A 
& WANSBBKK, T J 

BEALK , F. M L & 
LITTLE, R A 



(1981) 



(1975) 



BENTLER, P M & WEEKS, D G (1980) 



BOXi G E P & (196A) 
COX, _D R 



ECOB, R & GOLDSTEIN, H (1981) 

FOGELMAN, K (1976) 
FOX, T & ALT, J (1976) 



FULLER, W A & (1978) 
HIDROGLOU, -M A 



FULLER, W A ('980) 
GEARY, R C (1943) 

GOLDSTEIN, II (1970) 

\ 

GUTTMAN, 1. 
GUTTMAN, L 



er|c 




Latent Variables in Economics and elsewhere 
In IntrilliguTor: Handbook #f Econometrics 
(forthcoming) 

Missing Values in Multivariate Analysis. 
Journal of the Royal Statistical Society, 
U, 37, 129-145 

Linear Structural Equations with Latent 
Variables - ' 
Psychometrika, 4_5, 289-305 

An Analysis of Transformations. * 
Journal of the Royal Statistical Society 
B, 26, 211-252 

Instrumental Variable Methods for the Estimation 
of Test Score Reliability 
Submitted for Journal publication 

Britain's Sixteen Year Olds 

National Children's Bureau, London , 

The Reliability of Occupational Coding, 

Paper presented to the Seminar on Longitudinal 

Studies 

New College, Cambridge, March 1976 

Regression Estimation after Correction , 
for Attenuation. 

Journal of the American Statistical Association 
26, 99-104 

Properties of some estimators for the errors- 
in-variables model. 

The Annals of Statistics, S 9 407-422 - 

Relations between statistics: * the general 
and the sampling problem when the samples are 
lar&c 

Proc. R. Irish Academic A, 4^, 177 

Some Models for analysing data on Educational 
Attainment (including Discussion) 
Journal of the Royal Statistical Society 
A, ]42 f 407-472 

A basis for analysing test-retest reliability 
Psychometrika, _T0, 255-282 

Reliability formulas that do not assume 
experimental independence . 
Psychometrika, 18, 225-239 



21 



GIJTTMAN , L (1969) 
9 

HEALY, M J R'& GOLDSTEIN, 11 U976) 

HOPE, K, GRAHAM, S & ( 1 97A) 
SCHWARZ, J R 

JORESKOG, K G & SORBOM, D * (1981) 

HIDIROGLQU, MA, FULLER, A (1979) 
HICKMAN , R P * 

KELDERMAN, H . (1981) 

KAPTEYN, A & WANSBEeK, T J (1978) 



' KEMPF, V (1977) 

KENDALL, M G & STUART, A (1973) 

LORD, FMi NOV1CK, M R (1968) 

MARINI*K, OLSKN & RUBIN, D (1979) 

McDONALD, R M (1981) 



ERIC 



Review of Lord & Novick: 
Statistical Theories of Mental 
♦ Icsi Scores. * 

I'sycliomcti ica, M f 398-40 A 

■» 

An appYoach Lo the Scaling of • 
C« s# . r;"*ri °lh1 At iri Mites, 
Bioro»'~ikn, 63, ?« 9-229 

Uncovering the Pattern of Social 
Strati r l.\tii on: a two ye'ar tcst- 
re'test enquiry w 
Internal Rcpor: , NuffieM College, 
Oxford 

LISREL V Users GjH' 
(forthcoming) 

SUPERCARP 5ti. iJition 
Statis t i cal Laboratory, 
Iowa State University 

LISREL Models for Inequality 
Constraints in Factor auJ 
Regression Analysis. 
Paper read to the Seminar on 
structural equation modelling 
with particular reference to 
LISREL, held at the Institute of 
Education, London University, 
September 1981 

Errors in Variables: Consistent 
and adjusted least squares estimation. 
In: Multivariate Analysis ,V (1980) 
. Ed: Krishrtaiah- 

* t 

Dynamic Models for the Measurement 
of "Traits" in social behaviour. 
In Kempf, W & Repp, B (Eds) 
Mathematical Models for Social 
Psychology, 

Huber; Bern, Wiley, New York. 

The Advanced Theory of Statistics, 
Vol 2, Thir^d Edition,* 
Griffin, .London 

Statistical Theories of Mental 
Test Scores 

Addison-Wesley , Reading, Massachusetts 

Maximum Likelihood Estimation in 
Panel Studies with Missing Data 
In K F Schuessler (Ed) Sociology 
Methodology, 1080, Jossey-Bass , ^ 
San Francisco 

The Dimensionality of Tests and Items 
British Journal of Mathematical 'md 
Statistical Psychology, 34, 100-117 



22 



RICHARDSON, 1) 1! & WU, I) M (1970) 



RIERSOL, 0 



(1950) 



SCOTT, E L 



(1950) 



WARREN, R D, WHITE, J K (1974) 
& FULLER, W A 



Lcdst Squares ami Croup inn 
method estimation in the errors 
in variables model. 
Journal of, the American Statistical 
Association, 65, 72W48. 

Iclcntif iabil i ty of a linear relation 
between variables which are subject 
tb error. 1 
Econometrika, JjB, 375-389 

Note on consistent es titrates of the . 
linear structural re?*tion between 
two variables. > 
AnnalS of Mathematical Statistics, , 
2J_, 284-288. 

An errors-in-variables analysis 
"of managerial role performance.^ 
Journal of the American 
Statistical Association, 69, 886-893 



23 



Jntrcicluct inn - * 

»■ " ■ — ■ ■ " ■ - * 

Since the bringing together of tlio fin.t brood collection of popers 

I 

dealing with tho problems of longitudinal studios {Harris, 1003), 
rcsoorch workers, especially in the child 'development field, hove 
become extremely interested in dur.ic.ninK and onolysing ouch studios. 
It has been recognised by ouch workers that certain questions of interest 



cK be answered only with longitudinal data" end hence. there hoc been o 
practical stimulus to tho development of appropriate methodology, 
especially that concerned with statistical model buil8ing.\l Much cf 
this model building has developed from the origind -Gv^rionc^^ 
structures model of Joreskog (19701, ond there is new a si^eab^\^ 
literature dealing 1 - with alternative models for analysis; a useful syfiort 
bibliography is giver, by Jprcskog and Sorbom (19/C) f . ^ 

Along with these developments, liowever, t:.QD seem to have been few 
attempts to study the applicability of different specialisations of tho 
models to real data* Because of the need to incorporate parameters, 
especially measurement errors and high order time lags, these models , ' 
tend to be overparameterised. thus, in eny practical application 
particular parameter values or relations between parameters need to be 
specified* There is a similar problem with more traditional techniques 
such as factor analysis, and experience with their application to real 
dato suggests that the problems are not easy to resolve « 

4 

* < 

It appears to the applicant, therefore, that a useful contribution to the 
tubject at its present stage of development would be tho> testing of some 
of the assumptions in these models with a view to obtaining specialisations 
which corno a*, close as possible to realistic descriptions of actual data. 
Tv/o broad oppraoches are available in tackling this question, first one 
may attest to simulate, realistic situations and hence compare tho 
perfoimoncG of alternative! models. Although useful, this approach would 
bo not only vr-iYUmo consamil ng, but would lose much of its- uscful novi 



unions tho slmulntn structures wore In fuel Known to be realistic. - 
For this reason it smmo, logically to corao nftor o second approach 
''has boon triad, and with which this application is )orgo)y concerned. 
Thio> second approach involves tho application, testing mid furthor 
development of *ot I. critical and statistical n-odcls -for the cnalysis of 
longitudinal' educational and social data; using data obtained from an 
extensive end representative sample of individuals/ Tho data set it 
is proposed to use. Known as the National" Child Development Study, is 
briofly described in the next section, following which the specific ' 
aims of the project art; detailed. 

The National Child Dcvclopncnt Study 

The data sot which will, be used in the investigation consist of 
measurements made on the total cohort of children born in Britain 
during *rd-9th March, 1S50* These 17,joo bebios were the subject of a 
large survey at birth, and at the ages of 7, 11 and 16 years. At the 
three latter ages, a large amount of educational data were obtained as 
well as socicl/physical and medical' data. At the age of 16, about 
Bn of, the survivors! still living in Britain provided information, seme 
14,000. _ Preliminary investigations (Goldstein, 1976) suggest that no 
serious response bias exists for the basic educational variables to be 
used by the project. 



o 



< 



Tho si?e dnd representativeness of this sanplo of children is unrivalled. 
It can be used to make valid inferences about the development of the child, 
population of Britain from birth until tho-lost year of co.rpulr.ory schooling 
It is also a largo enough sample to study satisfactorily the performance of 
chi Idrr.n when tost scorns aro catugoiised into narrow intervals. It has 
data covering n wry wide rangn of rhlld development, thus allowing ' 
relationships botuuon different as, cts of development. to bo studied. 



# Doiruusn of its f.ivo, thiu srayilc can the; expccti.d to j;tv« fine cH:.c.rJmitnitlnn9 
between altornattvo mout;l:,. Thu di^lriijulionol forms oTorror turms c * m 
be studied in drjloil, as can various assumptions of indoponduice*botweon 
ouch terms, furthermore, tho si*o of trie simple allows one to appeal to 
the lu, o of 'consistency' when making parameter estimates and carrying out' 
significance tosts, so avoiding some of the difficult problems associated 
with tho usual maximum likelihood and related estimation procedures. 

Tho applicant has been associated with the National Child Development 

Study for over 10 yesvs, end is at present engaged on methodological research 

using these data under an NIE contract (No. 400-70-0041). The. National 

« 

Children's Bureau has agreed to Make available a data tape containing the 
variables relevant to the proposed project, for the purpose of carrying 
out tho work. It has af.recd to this on tho grounds that because of 
past involvement, tho applicant has tho necessary understanding and 
experience of tho data to pursue independent methodological research with 

Outline of the Project 

a) The data to be used are" those collected at the threp ages of 

11 and 16 years. Tho basic model can be written as follows ' • 
end is illustrated by the accompanying path diagram. 



**** . ' ilyrJ 

. Winro x 4 is the 7 yoor meaouiwoQt on o chilcl, y t ia thu linear 

measurement, and tho 1G year mcasurnmcnt, with tho Ufsual 

Koonings r.ttnchnd to. thw other synxjolo. Tho mpor.urprnunto 

26 -f 



udocJ, in turn, are toata of reading ond mathcma&ic attninmcnta. 

° 

To begin with, .it is, clear that the obovo 'non recursive' system of 
equations dnvolvea assumptions of linearity and orfditivity, and those 
can readily bo tested with the availability of such a lore© sample. 
Transformations of tho data will be studied, designed to satisfy 
these assumptions. In addition, "the "distributions ~of the error terms , ^ 
will be examined, especially with regard to normality and homoscedasticity 
assumptions, and mutual independoce of error terms. S 

The first t^^i^pro^ler^l^h'ihis system arises when one Wishes to 
recognise the 'fallibility' pf the measurements used. That is*, if one 
vishes to make inferences about 'true' underlying .attainments as opposed 
to inferences about relationships bet^oen observed scores, then the ' 
unobservable 'measurement error' of the tests used must be incorporated 
into the system. It is well known*that the parameter estimates in the 1 
above equations are "inconsistent estimates of the parameters in ther * 
corresponding equations relating the true attainments. In order to 
provide 'good' estimates of these latter parameters, which are at least 
consistent, further information must be provided. This may for^exarrple 
be in .the form o£ additional equations involving 9 instrumental' 
variables, or in the form of independent estimates of the variances snd 
covariances oT'the measurement errors. The ffrs't* approach -involves* • 
further assimptions about independence of error terms, and these will 
be studied. The second approach will yield consistent estimates, but 
depends on either known population values of. tho variances and covariances 
or good cto chastic estimates. The latter are available fcr the 
measurements used, and results with tho two approaches v/ill be compared, 
.thus providing further checks on pnrt icrlftr assumptions. An early 
analysis along these lino's is described by fogliMwn and Goldstein (1070). 



On top of thl!) basic modi.], .explanatory variabilis will l»o introduced 
ot each ago and Choir relationship with tho tli-pundcnt varliiblos 
examined. For some of thusa variables, ouch or, family size, thoro is 
interest oloo in the effects of changes in tho variable between .ages, 
• and tho most useful me- hod of incorporating such change variables into 
tho model will be investigated. Finally, a 'bivariate' model will 
• ~ be examined where both reading and mathemJtics scores .at each age 
. are incorporated into the model. With these more complex systems 
•of equations, the large sample size will again permit careful 
examination of alternative model assumptions. 

( b) Tlost of the methodological literature on longitudinal analyses 

■ deals with coatinuous measurement data. Some work, using log- V 
linear models, has also been done for discrete or categorised data 
(Goodman. 1973). Of considerable interest, however, is the relationship 
between the two approaches. For example, to some extent assessment 

end progress through an educational system is based 'upon ■ 
categorisations of essentially continuous underlying abilities, and 
1 some of the consequences of this will be considered by comparing the 

■ relationships across tine of educational categories imposed by teachers 
(os expressed in ratings) and the relationships between continuous 

0 

variables descrl ^ Gd QboVG# The comparison will also be carried 

put using categorisations of tho continuous variable measurements 
themselves. The size of the sample will allow useful comparisons to 
bo made between thoro approaches, end attention will also be paid to 
the l^Ctlo discussed problem of measurement error in categorical data. , 



c) Many educational data consist of item rati 1$ scales, for oxanpio - 
for behaviour on academic motivation. Whoro a act of item ratines is 
intended to reflect on underlying attribute, for example behaviour 
toward* a teachor. it is convenient, to allot scorer, to the item cat« e orios to'- 
givo an overall score for each child. These overall scorts con then be treate 
as pseudo-continuous in subsequent analyses. Vorious procedures can bo j 
used to estimate the scores, both across-sectfonally end taking account 
of the longitudinal nature of the data. A detailed discussion istfven. 
by Healy and Goldstein (1976) and the/project will extend their results ' 
intwo directions. „ 

First, by applying the techniques to the National Child Development 
Study data on behaviour and academic motivation. In particular, 
scales Ull be related across ages in o-der to devise scoring systems 
which agree as closely as possible at eacb age. and to Compare these 
with those derived separately at each age. The techniques will also be 
used in order to search for meaningful sub-scales. Secondly, to 
extend the techniques by looking at the possiblity of alternative 
•constraint' systems suggested by the data, to look at the possibility % 
of 'rotating' estimated vectors, and to study the dis-tributional , 
-p"roperties of the sealer ,As before, the extensiveness of the data will 
onoble a proper assessment to bo made of the practical usefulness of 
these different scoring systems. 

« 

Ref orence s 

1. Fogelman. K.R. and Goldstein, H. (107G) Socio] Factors Associated' ° 
with changes' in educational Attainment Between 7 and 11 Years of 
Ago. educational Studios 2, 05-109 



, 29 



2. Goldntoin, II. (197G) A Study of Rosponso ifatcgi of Sixtoen * ' 
■ Voor Olds in tho National Chlld n,>volop„,nt Study. „ Stain's 
Slxteon W|01d, Ed. K. Fo Z a imn , National Child™, s Bureau. London. 

3. Goodman. L.A.^1973) Cou.oJ Anal ysl3 of Da>a < vom p 3ncl StucJlGS 
and Other kindo of Surveys. , Amer. J. of Sociol. 78. 1135-1191. 

4. -Harris. L.W. (1963) a- 4 , Probloms Murine Chon C o. Univ. of 
, Wisconsin Pross, Madison. 

Healy, M.J.R. anc j Goldstein H fiQ?ri a „ 

eiostein. H. (1976) An Approach to the Scaling r . 

*f Categorised Attributes. Biometrika, 63, 219-229, ' \ 

• Jprcsko E , K.G. C1970') A General Method for the Analysis of 
Covariance Structures. Biometrika, b7, 239-251. 

— -~^- K - G - Bnd D " <™> statistical Mo.ols ano Methods 

for Analysis of Longitudinal Data. Research Report 76-1 
Department of Statistics. Univ. of Uppsala. 



30 



APPENDIX 2 



T HE NATIONA L CHILD S^j^lK^^TTOY 

The National Child* Development Study (hereafter called NODS) consists 
of a cohort of around 17000 children comprising all births, in England, Wales 
and Scotland in the week 3rd - 9th March 1958. The initial purpose of the 
study was to examine social and obstetric factors associated with still birth 
and death in early infancy. The children were followed up at ages 7, 11 and 
16, generally around the times 0 f change of school institution at ages 7 and 
11 and at 16 during their last compulsory year at school, theirs being^'the 
first year group for whom the minimum school leaving age was 16 years. 
Extensive social, education and medical data were collected at each age, a 
description of the 16 year data being given in Fogelman (1976). The response 
rate was high throughout the study, an overall response of 91, 91 and 87 per cent 
being obtained at each of the three ages.. Goldstein, in an analysis of the 
characteristics of the non-respondents at 16 at previous ages xn an appenuix 
to Fogelman (1976), showed that the biases due to the complete non-respondents 
are small. 

At present another follow up is being made of the cohort, now aged 23. 

For the present study a subset of the variables at ages 7, 11 and 16 

was used. At each age this included the region and characteristics of the 

school, information about the child's home including the numbers of brothers 

and sisters, social class, number of persons per room, amenities and whether 

the father stayed on at school after minimum school leaving age. At 7 years 

a multi-item description of behaviour in the home was obtained from the parents 

and educational information included tests of reading and arithmetic, teacher 

ratings of a number of attainments and information on special educational provision. 

At 11 years and 16 y*cars information under each of the above headings was 

recorded together with, at 11 years, tests of general ability broken down into 

verbal and non-verbal components and of performance on a copying designs test 

and at 16 years an inventory of attitudes towards school. 

In nil 212 variables were selected. The analyses reported here concentrate 
\$i O tinly on the educational and hon*o background data. 

fel KK ■ 



31 



APPENDIX 3 » 

THE CLASSICAL TEST MODEL AND THE l^JgNTjMjtr AjHj^ MpWn^COMPARKP 

By Russell Kcob 

Two models concerned witli measurement error, the Classical Test Model 
and the Latent Variable Model, ai # e described, the differences highlighted and 
the necessary assumptions for the equivalence of "the two models given. 

Let x ^ be a measurement of individual i on a test j at occasion k. 
A simple model of measurement error is that 

where u is the measurement error and £ . is the 'true' value. The 'true' 
scores are seen to be completely defined by the measurement error giving a 
tautologous model (Kempf , 1980) and for the model to have any meaning i.t is 
required that the factors which are considered to contribute to the measurement 
error variation be explicitly defined. This we will attempt later. 

The Classical Test Model (Lord and Novick, 1965, 1968) makes the following 
assumptions about the quantities ' 



Tl 


K 




T2 


K 


t 


T3 


E 

K 




T4 


Var 

K 


(.«.,; ' =< 


T5 


K 





m Note that all expectations are taken over a hypothetically infinite number 
of replications. Particular features of this model are that the variance of the 
measurement errors is assumed to be independent o'f the person and therefore 
of the true score value also. As all quantities are independent of i they hold 
also rhen summed over the population of persons. Tl, called the assumption of 
experimental independence between items, is crucial to this and the next model 
and will be examined in the latter context. 

*Note change innotation 



a*- 



Tho Latent V a r i ah 1 c Mod el has certain differences from the classical" 
test model. Here a vector of L latent variables, *! , accounts for the 
covariation between individuals. at a given point in time, K, Thus for n given 
item, jl and person, i, we have 

where ,X Lt ^ = ■•-•^«.. J J are tholoadin g s on the L latent variables 

Q -01.71, ■■ — *1 ) and Uij, * s the measurement error. The 

ij^V l V ''-0/ ijk 

following definitions hold dropping the suffix, k. The latent variable 

values and loadings are viewed as, independent of the replications, ,k, * , 

U Cov ( "i,., «./) * O • . 

L2 Cp^ ( u.j ( T1 L .,) -0 6-- - " <* 

£4 Var {.M.j) »^ 



Here the expectations are taken over a hypothetically infinite population . 

qf individuals (i). The variances of the latent variables are fixed by fixing, 

say, (Xcfl, £•> /_J lJ for person i * l and item j = 1, 

The 7) may comprise both general and test specific latent variables 
the latter having non-zero loadings only for a certain group of tests sharing 
a common characteristic -under the unidi mensional assumption, L ~ 1 and 
this will be assumed for the present discussion. 

For the latent variable approach to be related to the classical test 
model the following condition known as the experimen t al independence between 
persons needs to be added to the latent variable model 

" £ (^.j*- 1 !,- -Ay or Con/ (u. jK , tO K ) « * 

This is the condition that the covariance of errors of any two individuals 



over replications is zero, and for instance in a group test would assume no 
mutual influence to occur on test scores. This would not be expected to hold 
if cheating or other tcst-rcla ted mutual interaction was involved. Unless 

this condition holds, the commonly calculcatcd reliability coefficients 

j 

basod on a single trial based on factor or latent trait analyses will 
i 

often be overestimates of the true reliability as will be shown later. 

Conversely, Guttman (1945, 1969) shows that the above condition 
of experimental independence between persons in, infinite populations of 
persons leads to the; true scores being independent of the particular trial 
and thus to parallelism of trials. The assumption of experimental independence 
between persons is crucial in the context of assessing the dimensionality of 
tests and items or of the number of latent traits (McDonald, 1981). 

Turning to the item domain the analogous assumption is that of 
experimental irdependence between items mentioned earlier which applies to 
both classical test theory (TI) and to latent trait theory (LI). This will 
not hold if the response to ari item or test is dependent on -the responses to 
previous items or tests. This is necessary in order that- a test with a 
calculable reliability can be constructed by selection from a pool of items on 
which reliability coefficients have been independently calculated, or that 
internal consistency measures of reliability are consistent. " 

A modification to latent trait theory to allow for the relaxation 
of the assumption of experimental independence between items is made by 
GuUmafi (1953) and Kempf (1977), McDonald (1981) gives a full discussion 
of the assumption under the nwr.c local independence, in relation to the 
dimensionality of tests and items. Even these extension s however, do not 
allow the response to a given item by different persons to be differentially 
dependent on previous items. This could arise in test situations where a 



person related variable such as fatfcguo or teat anxiety »ay relate to 
performance in the same way fox different persons but may. itself be ■ 
differentially affected in different noopJo by o givon item. 



Rofcrence 

Kempf (1980): Paper read to session on educational applications of latent 
trait models at 1hc fourth international symposium on edu^ctional testing, 
Antwerp, Belgium, June 1980. 



35 



The Dist ribution of Variables an< r rans format ions by Runrell Boob 

A requirement of the maximum likelihood methods for estimating 
latent structures on multivariate data described in, <*fh 2.4 is that the 
data are multivariate nonr.ni. When the observed data does not possess 
multivariate normality, transformations may, in certain cases, produce this 
property or an acceptable approximation. A necessary and sufficient condition 
t for multivariate normality is that any linear combinations of the vaiiablcs 
is normally distributed. In addition* all the normally distributed marginal 
variables are linearly related as are any linear combinations of these. Thus 
the multivariate normal distribution has strong linear properties and it is 
this which allows the use of reasonably straightforward linear statistical 
techniques. But what if no transformation of the data will produce multivariate 
normality? 4 We then have a choice, given that we wish to transform the datfa, 
of either achieving linearity x>f relationships between variables, which can 
always be done by non linear transformations of individual values, or of 
obtaining normality of each marginal distribution separately by the same 
method but sacrificing the linearity of relationships between variables. We have 
also the further alternative of using transformations within a certain class, e.g. 
the one or two parameter or shifted power transformation of Box and Cox 0964). 
In this case neither of these desirable propoerties may hold. We therefore 
need to ask whether we are justified in making these noi'-linuar transformations 
and i£ so, which property, of marginal normality or linearity should we regaid 
as more important. First of all w\distinguish non linear and parametric 
transformations. \ 

\ 

Use of Non Linear Transformations and Parametric Transf ormations 

We define non linear transformations t^ be those which transform « 
par ticul as ordinal values to a particular interva^scale individually. 

The variable will be transformed to a particular distributional form be it 

\ 

ione of the theoretical distributions or the distribution,, maybe arbitrary, t 



ERIC ' 36 



2 - 



|£* .of another variable when a linear relationship to this variable will result. 

We distinguish between these transformations anc 1 the parametric trans foimations 
|^ which transform the variable through a parametric relation, for example, the 
transformation set .y Box . ml Cox(l9u4). 



The non ^Hncar transformations are equivalent to the scaling of 
ordered categorical data (Kendall and Stuart, 1 973) and assumes that no information 
regarding the interval scale properties of the raw data is regarded as relevant 
to the' analysis (indeed any interval relationships can result from such a 
transformation). However, the transformed data rs regarded as having 
interval scale properties. For instance, the difference between the mth and 
m+r th order statistics and the n th and n + r th order statistics is assumed * 
to have a certain value as well as a certain sign and a person who in a tcst^f attainment 
taken at two occasions improves from the m + r th position to the m th position 
generally has a different degree of improvement from someone who improves from / 
the n + r th position to the n th for rvf £ n . Thus che characteristics of the 
distributional form to which the data are transformed are deemed to be relevant 
to the data, the raw scores being arbitrary apart from the ordering relation. 
This distinguishes the use of these transformations from the use of non-parametric 
techniques making no assumption on the data be; d the ordering" of scores and 
giving an equal improvement to the two persons mentioned above. Bock(!975) 

•i 

takes the* position that the extent of theoretical and empirical arguments for 
normality do not generally justify the use of non-parametric techniques apart 
from in small sample tests of the null hypothesis. We will generally be working 
with large samples and examining complex relationships and generally endorse this line. 



9 

gERJC 



37 . 



Won Linear Transformations nnd their Justification 

We confine the following discussion to scores on tests of attainment 
or ability. It is common to transform test scores to a standard normal 
distribution. Indeed, no test is marketed without 'standardising' on a suitable 



T 



population. Therefore when using a test on a random sample, from a population 
probably different from that on which the test is standardised, perhaps in regional 
and demographic characteristics and in 1 up-to-date-ness ' , it may be justified to 
restandardise the observed scores to a normal distribution. In doing this we are 



usually saying that we regard the standardisation as inappropriate and so"cannot 
make any inferences on the relation of our population to the standardised one. 
Alternatively, we examine the relation of our population to the standardising one 
by comparing the standardised scores from the test manual with, those from the separate 
N ^standardisation of our distribution or, less strongly, compare the mean of 
distribution standardised according to the test standardisation with the test 
standardisation distribution to infer relative overall level of attainment in 
our population. . 

These transformations assume that the distribution of the appropriate 
. attainment or ability in the population is normal. 

An alternative non linear transformation is that which allots an age-specific 
attainment (e.g. reading age) to an individual. Here a test is given t<? a 
population of varying ages and each raw score is allotted a value corresponding 
to the age whose average attainment is this particular score. This will not in 
general give a normal distribution of scores particularly in an attainment such as 
reading where progress is riot constant with age and where certain experiences, difficult 
to acquire at a particular age, may be necessary in order to achieve a certain score. 

♦ 

Moreover ttfis form of transformation may not be suitable when a test is given for its 
diagnostic as opposed to its placement value.? 

A further use of non linear transformation* is to transform to a linear 
relationship with another, perhaps previously transformed variable. Tins may be 
done where further examination of the rcl.it i en*: hip between the two variables is of 
interest and there is no external evidence that this should not be linear. 

38 : .\. \ 



The neqcssnry question to ask here* is why it is one variable rather 
than the other which is transformed, Again convenience of statistical analysis 
would lead in a conditional analysis* to the dependent variable first being 
transformed to normality and then the independent variable being transformed to 
linearity with it. The conditional distribution of the dependent 

'variable is then normal, though not of constant variance for alf independent 
Variable values unless, this in turn is normal and the generalised least squares 
estimation procedure will produce optimal estimates. 

A further possibility is for both variables to be transformed by canonical 
methods in order to maximise the correlation between them (sec Kendall and Stuart* ■ 
(1973), Vol 2, p Here the relationship between the two transformed 

variables is linear though neither of the distributions have a predetermined form. 
However, if variables can be transformed co joint normality these will have 
the maximum correlation and so when this property holds, all 

the methods described provide the same transformation. A natural reservation about 
this approach is the maximisation of a quantity, the correlation which is the 
only quantity further analysed rphis describes the relationship between the 
variables which is often the focus of interest. However, i f^ie^l^^^o*^ 
this approach one has to adopt some piiority on the variables included in the 
analysis first by fixing the distribution of one variable_or alternatively by 
fixing the marginal distributions of all variables to known form. 

A second difficulty is that of applying the canonical analysis to more rban 
two variables. A possible interpretation is that given by liealy and Goldstein (1976) 
of minimising the sum of squared deviations from the assumed underlying value 

at a given time of several indicators, the summation bein^ over a number <?f time 
periods. This again assumes a linear relationship between the values of the 
underlying variable at different times. 



UC ' 39 



1 . - 5 - 

Linearity or Marginal Normality ? * 

In the fields in which those non linear trans format ions are usually 

applied we generally have no theories which specify particular 

* distributional forms or relationships between the variables in question. We 

may contrast this with the case in physics wfcerc a particular law relates the 

* height dropped from and final velocity of a mans in a vacuum. Both these 

quantities are measured according to measures with known interval scale 

properties, and the non linear transformations (say to linearity between 

* variables) would be 'inappropriate (the relationship is thought to be quadratic) 
< • 

and the parametric transformations would only be used here in order to provide 
a more powerful test of the relationship given particular (ordinary least 
squares) statistical techniques, 

^ Also rclcvart in this connection is the change of relationship from 

quadratic to linear when using a quadratic transformation on one of the varices. 
One degree of freedom seems to have been gair.cd in the testing of the relationship 
and should have been taken into account in the transformation. "Thus non linear 
transformations reduce the number of degrees of freedom in the data by 
the number of values on each transformed variable which arc independently ' 
transformed. Thus degrees of freedom for testing only remain when a number of 
persons score at the same value on a test: 

Transformations giving linear relationships may generally be justified 
if no reasons are hypothesised for the relation to be non linear or, indeed, where 
non linearity of relationships c betwccn variables would be unintcrprctablc. Thus 
in the case of reading attainment, when it is not anchored* in terms of age 
equivalent scores or any other external properties, any non linearity may have 
'no 'reasonable interpretation, the relation of scores between ages being 
completely describedby the correlation. The effect of other variables, e.g. 



social data, home background, ^cho&l characteristics on this as a crit 
then be examined. 



crion may 



Normality of distribution- is widely encountered in naturally occuring 
distributions (e,g. height) and is known by thv Central Limit Theorem to result 

IKJ( ' ' 

llSk;i,>-, ' ■ 



40 



^gj^fyvK^^'.? * ""V* 0 




■* 6 



from the sum pf 4 large number of randomly varying quantities. Thus a long 
te^t onwhichLthe response to successive items is independent should produce 



c 



a normal distribution of scores. Failure to do so may be due either to the 
test being too short (e.g. less than 50 items), to non-independence of responses 
to items, or jto a large proportion of the items being generally too easy or 
too herd. In reality all of these explanations usually hold particularly the 

c 

second. However^ even nth'* items which vary in difficulty it can be argued 
that given a suitable choice of items a test of infinite length will have c 
F a normal distribution. A weaker argument is that the ability or attainment 

in Question is thought to represent the sum of a large number of randomly varying 
\-\ influences* and thus be normally distributed. 
In this ca.se the test will be regarded as having a non-normal distribution for 

" reasons only of faulty test construction. The above argument, however, 
: " . assumes a homogenous population. V7hat if there are, say, two sub-populations 
7 . . - J each having different 

mean attainments at a particular age? Under the previous -argument we have 
a combination of two normal distributions with different means giving non-normality 
overall. The weaker argument of normal distribu tion of ability assumes that there 
•are' a very large number of such sub-populations corresponding to divisions of the 
overall population on different characteristics and none have differences in their 
means substantially larger^ than the others. 

, , The most common reason for a non-normal distribution of score.', in a 
^particular test is the use on an inappropriate poulation making the test either 
too difficult or tob easy. The* the spread of scores at one extreme of the 
rang is not sufficient to differentiate" the assumed, real difference in ability 
or attainment giving a a skewed distribution. This the case in the reading 
test at 7 years used in the NCDS and results from defective test standardisation 
by the test producers (as the NCDS sample is a national one and effectively 

c 

random) and is to a certain extent true of the reading test at 16 years (which 
was standardised), originally on nn // year sample.) 

ERIC , 41 . 



In all cases a variety of possible transformation will give linear 
relationships between two variables each givjing differing correlations. 
Whereas in the bivariate normal case the appropriate transformation is obvious, 
in other cases we have to play off statistical convenience, which leads no the 
dependent variable first being transformed to normality against a desire to 
place limits on the degree of non normality of all the distributions concerned 
or to maximise the correlation. When there :s more than one . dependent variable 
it may not be possible to ensure that each is normal and at the same time allow 
linear relationships between them. 

Why do we obtain Non-Multivariate-Normal Distributions? 

We have argued that it, rarely makes sense to assume non linear'relat ionships 

V 

between attainments at different occasions and that when a test has a non-normal 
distribution it is generally admissablu to transform the scores to have a normal 

distribution, 'th?s then being representative of the distribution oJL-ability or 

i 

attainment in the population. Why then do. we not always obtain multivariate normal 
distributions? 1 * ' 

One reason has been suggested, earlier. It is that the population is not 
homogeneous. Other explanations have to do with^ the jiatura of the ability 

or attainment tested and the nature of its development. It is difficult to 

1 

imagine that an attainment, say, of reading, will. have the same nature at different 
ages. At age seven the skills learnt will be more , to do with the recognition of 
individual words, whereas later, at age II, they will have to do more with the 
solving of complex syntactical problems. A word recognition test may be more 
appropriate to c the seven year old and a reading comprehension test more appropriate 
to an 1 1 year old. However, these different attainments may require other attainments 
(e.g. world and subject knowledge) before they are able to form the basis for " 
further development of readinr.nrtni'nment. If a skill on one attainment, say 
on reading, is gathered at the expense of another, say, world knowledge, then 
abpve a certain stage a high attaincr at age seven may not be expected to maintain 
,hi8 high position relative to the rest of the sample. (Note: tliii; is a different 
argument from regression to the mean) and so I he relation between attainments 



between occasions will be non linear. This argument can, however, also be used 

to justify a natural ceiling on a particular attainment at a particular age and if 

at both occasions natural ceilings existed, then the relationship of attainments 

between occasions could be again linear, 

A further possibility is a defective test. This could be a test which 

does not correctly order the subjects on the attainment supposedly measured 

and which'-'does this in a non-random way (it was seen earlier that if the error 

distribution was the same as the distribution of observed scores then the 

relationship between the true scores on. two tests is the same as that between 

^ the observed scores). This could be due' to particular test-related factors; 1 

which af£ect different subjects or different subpopulations differentially, 

' i • 

Possible examples are test anxiety where a test may cause generally high 

anxiety, perhaps because of unfamiliarity, and reduce disproportionately the 

scores in highly anxious subjects; boredom or tedium is another possibility, 

or the use of words which are only familiar to members of a certain subpopulation 

' or region. 



APPENDIX 5a 



THEORY OF INSTRUMENTAL VARIABLES ESTIMATION 



' By Russell Ecob 



.GENERAL THEO RY 
, — 

r 

c 

Let X,., X 0 . be the observed values of test score variables measured 

11 ♦ <21 

as deviations from their means at the first and second occasions. 
They are the predictor and dependent variables respectively in a simple 
linear regression model. Let T^, T gi be their true values 

and e^, be the errors of observation Qf the ith subject 

(i * 1, ••••!»)•* 

f } C ' 

- \ 

Thus we have 



X li = T li 1 e li 



(1) 



and-o model relating the truo valuoo ot cadi occasion Is 

. , T 2i = fiT li + \ * (3") 
Lot be the observed value of another variable, called the instrumental 
variable, ' c 

'iT-j/iSi ( L z i x ii rl «> m 

is jcallod the instrumental variable estimator pf the regression coefficient 
• 3* (Johnston, 1972). ' 
»oo (l), ( 2 )/3 ) we have 

b lV = (6 Z Z i T ii + 2 Z i V Z Z i c 2i > (Z Z i .< 5) 
and blv -B=(E Vi+ I V2i . B 'l Vii , (I^^ , w 

By letting the sample size tend to infinity, we have 

lim (b - B) = lim a 2 u )(Z Z X l" 1 + lin {lz c , (J ; 2 x -1 
n* •» n-» » 11 1 11 n-*» i 2i * i li' 

ng? lim (I Zf^) (Vli > - ' ' (7 ' 

Given, that liu I z^ 0 ; %ttf condition for b^to be consistent is 
therefore U^ZZ^ ♦ Z 2i e 2i - = 0 . (8) ' 

The first term represents the covariance- of the instrumental variable 
with the disturbances, Uj , the second the covariance with the error of 
observation of XjJ . , and the third the covariance with the error of 
observation of ' Attention has traditionally been focused mainly on 

the second two terns in this expression: indeed many reviews (for example 
Kendall t Stuart, 1977 Chapter 29, Madansky, 1959) limit their attention 
mainly to the "sLructur al relationship " case where = fr^ , mil so 
the first term in (8) is identically zero. When this is^tho case it 
•ay often be possible to make a Judicious choice of 2 so,that the 
second and third terms roughly cancel each other out. 

In terms of the sample correlations, r^, r^ , r^ between the 
instrumental variable Z and the disturbance and erro'rs of measuVement 
°* *2' X l rofl P cc tivoly l 



\ 



wo have 



IV O q Zu 



1 0 



°0, 



zx. 



«nd if tho reliability of Xj is R, the expression becomes 

0 ' °o 1 
b IV " 0 4 ( r + 2 . 0 . R) » 

Expression (4) shows lhat „ ^ predlctor ^ dcpcndcnt ^ 
reveled tho instrumental variable estimator becomes its reciprocal. 



2.2 -The Efficiency of Instrum e ntal Variable Estimators 
We have 

Var(b^) = c > EzJ / (ZZjX^) 8 
If b OLS is the e '«»**y least squares recession coefficient defined as 
, b OLS = ^liW M Ii' th0U Vhr(b 0 LS > = ° 2 ^ ' 



.The efficiency' of b relative to b is given by . v »< b OiSJL - r> 

OLS ' Var(b Iy ) ZX ; 

Thus the iriterion for an efficient instrumental variable is that it 
x correlates highly with the predictor,^ 



(1C) 



2 ' 3 n ° Usc of Ma "y Instrumental Variables 

When we have p instrumental variables Z , j=l p 

IV i j J ij A 2i V r jf C j Z ij X u • 7,10 combination of Zj which 
gives the most efficient estimate of can be found by choosing Cj .„ 
that Corr(Z C^, X^) is , maximum. The Cj are then the sample 

regression coefficients, b j( of X x on Z y J=i p 

Uitinc X u . T. we obtain bjyf . (S^)/^ . 

»- efficiency of the instrumental variable estinator is now the square 

Of the multiple correlation of tho instrumental variable set with the 

predictor. 



46 



- 4 



2 ♦ 4 _T>io__Uao of jLummy_^irlab) eH us InatriM^ntiil Variables 
The previous discussion lias assumed the existence or instrumental variables 
which' can be modelled as having simplo linear relationships with the first 
occasion vaiTfiblc. 

/ I ' 

Two other* ease's* can be distinguished. Firstly where an interval scaled 
instrumental variable has a non-linear relationship to the first occasion 
variable and secondly when the instrumental variable is categoric, for 
example measured on an ordinal'or nominal scale. 

In the first case the non-linear relationship can be modelled, say 

by a polynomial. function, or the instrumental variable can be grouped into 
R categories. In the latter case each category can be represented in the 
usual way by y dummy variable. This takes the value 1 for this category 
and 0 for every other category. 



letting X lr , *j be two observations on the first occasidn variable which 

■ k k , 

belong to the sane instrumental variable category. Using the dummy 
instrumental variable^ to estimate the first occasion variable gives the 
estimate jj^ which is the Bean value of all observations 

in category r. 

' R - - R i 

Ot) 



.ERIC 



™us b Iy = tfp r X 2r X lr ) (Ep^)" 1 

i-=l r=l 

Where is the mean of the X^ in category r, and p r is the proportion 

in category r. This is essentially the "Method of grouping" as 
introduced .by Wald (1C40). 

Wald (19-10), Ncyman and Scott (1903) and Madansky (1959) have given 
conditions for consistency of the grouping method. Nccccsary conditions 
arc that (a) the grouping of X is independent of the errors e and (b) that 
tho-denomtnator -r lho' right hand side of ( H> door, not abroach zero 
•8 the sample size tends to infinity. Hando'm allocation to the grouns, 
forexnmplo, would satisfy condition (a) but not condition (b). One 



.47 



way to ensure (a) would bo to know tho relative ordering of the truo 
values T , This is difficult, however, without knowledge of tho true 
values themselves which in general of courso are unavailable. 

The conditions for general instrumental variables are 
lift I Z t 



lift I Z* 0 ^ 0 , lim E Z X • 9? 0 



'4- 



The necessary and sufficient condition for consistency based on ordering 
by observed values arenas follows. For any two groupings , let 

X 1 ,P 1 and X x <l-P 2 ) be the ^ and <1-P 2 ) percentiles of f <X ) ( the 
distribution of observed values. If Q* f v} is the shortest interval 
such that PU<e u <v} = 1 i.e. if v-u is the range ofe , then 

• y. 1 

**CP *** * consistent estimate. »of 3 if and only if P {X, ,P,-v < T, < X, ,P,-u } 



1' 1 



1 1 



P{X 1 (1-P 2 ) - v<-T 1 <X 1 . (Ir.P 2 )-u) = 0 

! * " 

This jaeahs that the range of must have "taps" at appropriate places where, 

T has a* zero probability of occuring. Onlv if this is so can no ' > 

misgroup^ing occur with respect to the observed X's. 

It is jclfear that this condition cannot hold if the errors of measurement 
arc nofrnaXly distributed, due to the infinite range. However, if the range 



in the 



s finite, careful sampling of can ensure no observations occur 
particular jjvtcrvals. In particular, if the range of e^ is 



:RJC 



approximately known then an approximately consistent estimate can be ; 
obtained. 



The literature on grouping methods using observed Values of X^shas tended to focus 
on conditions for consistent estimates rather than quantifying the 
inconsistency of various grouping methods. The results on the NCDS data, 

go some way to remedying this situation for a particular 
data set. There has however, been work on the allocation to groups to 
optimise efficiency. These a;e summarised in Madnnsky (1959). Tlicsc methods 
cither assume that the variable is observed without error or use 
simulation procedures with very high reliabilities for X r An example 



48 



of tho Jottor is Hair and Bunorjoo (1012) who find that a division into 
throo croups using the extreme groupo for estimation gave 1 ' greater efficien 
and consistency than when a two group division into equal sized groups 
was used. , 

The simulations in this case involved normally distributed errors whose 
. standard deviations were 10% of the (constant) distance betwee.. any two 
adjacent true values. This (: ave conditions which approximated those given 
for consistency by Neyman and Scott. However, the reliability was 0.9999, 
seldom found in practice! Similar conditions are found on actual data 
of Madansky (1959) and contrary conclusions are .found by Kendall and Stuart 
(1977) on simulated data by Brown (1957) of 9 normally distributed 
observations with normally distributed error. Here Kendall and Stuart ' 
show that the three groups method using extreme groups for estimation har 
t higher inconsistency than .the two groups method. 



2,5 T - hc ^e of Instrumental Variables where t here is more than one fir st 
occasion Variable 

c 

Equations <1>, (8) generalise readily to. p first occasion variables ' 

where the j th variable X 85 T -re 

lij lij lij 



and 



2i 



7< B.T + u . 
j =1 J lij i 



(12) 



We use instrumental variables Z , k = 1 

jk 



.n to estimate X , 



ERIC 



In order to obtain consistent estimates of the parameters (5 we require an 
analogue of the condition (8) for each predictor ^ l„ order to obtain 
« set of efficient estimates we require the two condition-: 

1. The instrumental variable set Z^, corresponding to each predictor 
has a high multiple correlation with the predictor. 

2. Tho instrumental variable estimators of different predictors 
hnvtf low i»U*rcorrelationfl. 

Clearly, condition 2 does r.ot hold when tho same instrumental variables 
ro used for moro than ono predictor. 



49 



There i« no simple analogue or the formula (10) for the efficiency 
for one instrumental variable r as the standard errors now depend on 
the variance-covariancc matrix of the estimates. 



50 



m 



tf^r* APPENDIX 5b 



i 



Instrumental Variable Methods for the Estimation o f Test Scor: 
• Reliability 



By 



fry 



Russell Ecob & Harvey Goldstein 
Department of Statistics ? Computing 
University of London Institute of Education 
Bedford Way 
London WCl 



t N 51 

\ 



lNM)DUCTION 



In the following simple regression model, 
y. r» a + flx. + u. 



(1) 



it is well known (Goldstein, 1979 ) that if the observed independent variable 
x contains errors of. measurement, and if we wish to estimate the regression 
coefficient of the 'true' value of x, then the ordinary. least squares (OI.S) 
estimator is inconsistent. The simplest and most common x mo del relating the 
f true value to the observed value of x is (dropping the suVf* i) 



x = T + e 



(2) 



where T is the true value, c the random error pf measurement and\ov(T,c) = 0 

, . / \ - 

It is supposed therefore that we wish to estimate the parameters a,f$ in 



y = a + B T + u 
a a + B x + (tf-gc) 



(3) 



It is because x is correlated with (u-pc) that tht OLS estimator (b) in 
0) is an inconsistent estimator of 6. A consistent estimator is given by 
. /R, where R is known as the reliability of x and is defined as . 

R ■ Var (T)/Var (x) , (4) 

''where Var (x) = Var (T) + Var (c) 

In, many situations, the- vtfiue of R is very close to\l , and any adjustment to 
'tip usual estimate can be safely ignored. I„ other applications, for example 
in mental testing, R may be considerably less than l' so that an adjustment 
becomes necessary. In a linear model with several further independent variables, 
the estimators of these too will be inconsistent if OLS is -used, and consistent 
estimates may be obtained by adjusting the observed covdriancc matrix of the 
independent variables so that the observed variances corresponding to variables 
containing measurement error have estimators of their measurement error variance 
subtracted prior to inversion of the matrix etc., in order to calculate the 
coefficients duller et al, 197/,). To do this, it is important to. have 
accurate and consistent estimates of the measurement error variances, or 
alternatively reliabilities, and in this paper we explore some new procedures for 
obtaining such estimates based on instrumental variable techniques. 



- 1 •• 



ttK^iuuiro that item respond; ,nre. indeed determined by a single quantity for 
Cfich iMividua 1 !. sucli as C iven !>y (7), For the types of educational Lcst« ye 



deal with ru^t his p iper, 
is opju^ttnfit^^v^^niore dot 
(19B0). Secondly nn?aX 



t seem:! even less likely that a nnuliincMisional trait 

. liled discussion of this topic is jjiven by Coldste.in 

.•> 

>on (6) often known as tin- "local independence" 
assumption, n priori seems somewluit unrc.iswn.iblc. H is difficult to' iwiftinc-- 1 
that for a Riven individual, if he or site fails one item then the probnhi 1 it ies\ 
of success on later items are the same as when he or' she succeeds on the earlier 
Item. Nevertheless, tfhere seems, to l.ave been little, if any, scrio'us study of \ 
this problem and 'the consequent effect of non-r.ero correlate .s on reliability 
estimates. A further discussion of this point in the context of latent trait 
models is given by Goldstein (1980). Thus, there is as yet no really satisfactory 
\ method for obtaining a consistent estimate of reliability using "internal" . . | 
methods, nor evbn of providing a lower bound, and we suggest that estimates 
based on these methods should be treated with soma caution. 



'•2 External Estimates of Reliability ^ ' 

The most obvious method of estimating reliability or measurement error variance 
is^to carry out Ebpeat measurements. Thus, we have (dropping the suffix i), 
for wte^app l*i cat ions of a test, 




+ e 



1 



(8) 



h " T + c 2 



and 



Var(X, - X 2 ) - Var( Cj - e^) - 2 [o* - cov( C] , e^ ] 



For many physical measurements it is reasonable to assume independence of 



measurement errors, i.e* Cov(e Jt e 2 ) = 0 

so that we have a? = J var(X - X -) ' 
e 12 



(9) 



For mental tests, however, this usually will not be a reasonable assumption 
due to the presence of memory effects, learning, etc. If more than one test 
relating to the same thing is available, then by assuming suitable relationships 
between the true scores on the tests it is possible to obtain reliability estimates. 
Tbc usual assumption is that the tests are congeneric so that we have, for .vsct 
of p tests, 



X.. «= a. + b.T.A c. 
J 1 J J 1 J 1 



(10) 



The observed rovariancc matrix of U, c X contains |p(,„|) elements and if 



9 

ERIC 



53 



, wc assume Cpv(e.,o.,) - 0 and C6v(T.,c.) - 0 

' - m 3 > J y 

- | the matrjx is ft function o£ the b. an<] error* variances a' , which gives 
J2p parameters; Hence, for three or. more tests, unique estimates, for example 

• maximum likelihood ones, arts .available. Details of this approach are B iven 
in Joreskog (1971). ■ AUhottgh«Jt is not quite as serious as in the simple test- 
reiest case, t h i s • met hod also Ls the difficulty that the measurement errors of 



the tests may be correlated, fofc example because of day to day fluctuations 
among examinees etc. This 'inanimately raises tV question of definition of 
true score, but wc sha-H postpone Vsiscussion of that until a later section. 
' }■ » ' • ' 

, In section 2 we propose a gene mixtion of congeneric tests tf includrr any 
variable having non-zero correlhtio.i with the test whose reliability we wish to 



jneasurc. » V » 

i 



Such an "instrumental variable'" does 'not require any assumptions about • 
unidimcnsionality or independence andlilso, unlike the simple tcst-retest or" 
the congeneric test models, the possibility of choosing any variable means 
that wc can search for those which are} likely to be uncorrected with the 
measurement crrer'c^ Tlte possibilityjof dropping both these restrictive 
assumptions is attractive and " the remainder of the paper investigates this . 
problem using an extensive data longi ttidinal data set, 

"~* I ' 

I . .' , | 

2. The to ) • 

a 

The data come from the National Child {Development Study (NCOS) which followed 

> 

up a cohort of 17 000 children born in? one week of March 1958, at the ages of 
7, 11 and 16. The children belonged tfo the first year-group for who,,, the 
minimum school leaving age was 16 year's. A Ascription of the social and 
educational ddtn (among others) collated at these ages is given in Fogelman 
(1976). 



> 

i 
>' 
i 
> 



1 * . 

Testing in the wns carried out jby die 'class teacher. Since the study *as 

-irnntion.nl study of A 11 children bo/n in n particular week, most* children 
selected wore tested in n di'fforeni situation and by a different tester who also 
scored the test. / 1 



Four possible situations giving/ is* to response variation are as follows; 
I. The environment in which tife test is administered 



2. s The process" of test administration 

3. The coding and scoring of the test (this includes the interpretation 
of the correctness of the response) 

4. Day-to-day variation in individual test performance 

Siifcc only one test of a given type was done by each child at each occasion, 
the sources of variation 1-4 above arc confounded. It is important, liowevcr, 
to distinguish May to day 1 variation from changes in true score over time. 

We can regard variation over time as contributing either to measurement error 
or to true score variation or to both. A reasonable estimate of the true score 
at a particular moment would be obtained from a moving average of scores taken 
' at successive time intervals before and after. The continuous change in true 
test score over, say, a week is therefore regarded as being supplemented by 
random error to produce the observed day to day variation. The- various 
educational measurements in the NCDS were completed within a week for each child 
so that any true score changes ov6r not more than a one week period, are 
effectively regarded as part of day cj day variation 

In addition to thase sources of measurement er**or, there will typically 
remain an unexplained variation which can be conceptualised as the variation 
between the response to an item and its hypothetical replication. 

Cronbach et al (1972) argue that test evaluation or "generalisability" studies 
which also view a particular test as a sample from a universe of tests and 
which use experimental designs to estimate individually the above components 
of variation, should be carried out prior to test administration. 

We use here a "Test-specific" interpretation of true score which treats true 
scork as relevant only to the particular test. A justification for this is 
given\by Goldstein (19.79), although the methods used in this paper can be 
extended to a full 'generalisability* approach. 

Goldstein (1979), using the same NCDS data, also drew attention to the use of 
instrumental variables in estimating the relation between mathematics and 
reading attainments, when measured at different ages. He emphasised the 
potential usefulness of this method when imprecise prior- knowledge about the 
reliability of the earlier attainment scores is available, and pointed out that 
little was known about the degree to which the instrumental variables used 
satisfied the conditions of consistency. ' 

In this paper the properties of a variety of instrumental variables are examined 
in the context of the regression of 16 years attainment on II years attainment 
for mathematics and reading test scores separately. Comparisons are nude uith 



the use of ordinary least square:! and alao with the use of the internal 
estimates of the reliability coefficient for the II year attainment given 
in Goldstein (1979), 



3, Theory of Ins trumental Variables Estimation 

-3, 1 . General Theory 

Let X^, X 2 . be the observed values of test score variables measured as 
deviations from their means at the first and second occasions and let them be 
the predictor and dependent variables respectively in a simple linear regression 
model 

or measurement errors for the itli subject (i = I, n) 



Let Tj., T^. be their true values and Cj. e^. be the errors of observation 



Then we "have, as before; 



X.. = T . + e,. 

Ii li li 

X 0 . = T 0 . + e 0 . 

2i 2i 2i 



and a model relating the true values at each occasion is 



' T 2i " 6T .i + u i 



(ID 
(12) 

(13) 



Let Z. be the observed value of another variable, called the instrumental 
variable. 

Then 



v .i >h x 2i « V||) 



-1 



(14) 



is called the instrumental variable estimator of the regression coefficient 8 
(Johnston, 1972). 



From (II), (12), (13) we have 



b Iv ^ (BE'/..*,. TZ.u.* X V2 )(E Z. X,.) 



-I 



and b lv - 8 = (E Z.u. + T. Z.e^- £ Z.e,.) (E Z. X ) 



-I 



(15) 
(16) 



In terms df the sample correlations, r^, r & , r between the instrumental 

variable Z and the disturbance and errors of measurement of X ? , X, respectively 
and the reliability R of X 



3 IV 



1 a 

o e i 

6 ' ( T- r zu + T 1 r z<- " Br zr > iL-JO 
e i e i . 2 1 r zx, 



2' "I 
(17) 



Vhere 0,0, o arc respectively the st.ind.ird deviation:; of the errors on 
I 2 u 

•the 1st occasion, 2nd occasion and the distnib.ince term. As the sample size 
Jjends to infinity, the fol lowing consistency condition is obtained 

IERJC 

6 - 56 



% P su + °o p ae - <* V P zc - 0 (18) 



2 2 1 j 



Equation .O'O shows that if the predictor and dependent variables are 
interchanged, the instrumental variable estimator becomes its reciprocal. 
We note also that the efficiency of the instrumental variable estimator with 
respect to the ordinary least squares estimator is r 2 (l)urbin, 1953). 



zx 
l 



3*2. Hie Use of Many Instrumental Variables 

When we have p instrumental variables Z., j=l p 

b IV ■ <* J Vij *2£>'J f C j V.i ' <»> '" ■ 

The combination of Z. which gives the most efficient estimate of b can be 

found by choosing c so that Corrtf Cf.Z. ., *.) is a maximum. The c. are 

i J U ii , j 
then the sample regression coefficients, b., of X on Z., j=l p . 

Letting X = Z b Z wo obtain b = (Zx X,. )/EX, .X, . . 

. j J 1 J -iv j£ 2i li li 

The efficiency of the instrumental variable e« cimator is now the square of 
the multiple correlation of the instrumental variable set with the predictor. 

3#3 The Use of Dummy Variables as Instrumental Variables 

The previous discussion has assumed the existence of instrumental variables 
which can be modelled as having simple linear relationships with the first 
occasion variable. 

Two other cases can be distinguished. Firstly where an interval scaled 
instrumental variable has a non-linear relationship to the first occasion 
variable and secondly when the instrumental variable is categoric, for example 
measured on an ordinal or nominal scale. '"' — _ 

In the first case the non-linear relationship can he mode fled, say by a 
polynomial function, or the instrumental variable can be grouped into R categories. 

In the latter case each category can be represented in the usual way by a 

dummy variable. This takes the value 1 for this category and 0 for every 

other category. Let X , X. be two observations on the first occasion 

k V 

variable which belong to the same instrumental variable category. Using the 
dummy instrumental variables to estimate the first occasion variable gives 
the estimate x |f which is the moan value „f .,13 observations in category r. 



Substituting in (19) gives 



b lV W ( *rVlr> °V'lr rI ' . (20) 



Where X 2r is the moan of the !>,, the category rf and p is the proportion 

k r 
in category r. This is essentially the "Method of grouping" as introduced 
by Wald (1940). 

The literature on grouping methods (for example Wald (1940), Ncyman and 
Scott (1951), Madansky (1959)) using observed" values of X ] has tended to 
focus on conditions for consistent estimates and on the relative efficiency 
of different groupings rather than quantifying the inconsistency of various 
grouping methods. The results on the NCDS data given below, go some way to 
remedying this situation for a particular data set. 



Application of instrumental varia ble methods to the NCOS data 

4 . 1 Selection of Variables 

In all, 50 variables arc considered as instrumental variables, being measured 
at ages 7, 11 and 16. These consist of test scores, teacher ratings and 
background variables. The test scores arc of reading and "mathematics at 
each age and in addition of general ability and copying designs scores at age 
11. The teacher ratings arc of reading and mathematics at all ages, and in 
addition of oral ability and creativity at age 7, of oral ability, and general 
knowledge at age II and oi practical subjects at age 16. The "background" 
variables arc social class and indices of behaviour in the home at all three 
ages; the number of children in the household, birth order of the child, an 
index of accommodation facilities and childrens' heights at ages II and 16, 
overcrowding at II; and region, indices of school behaviour, and a variety of 
feelings towards school at 16. The reader is rcferccd to'Davie, Butler and 
Goldstein (1972), and Kogelman (1976) for a more complete description of these 
variables'. 

The variables used as predictor and dependent variable in the regressions, 
that is the test scores at age II and 16 of mathematics and reading, are 
transformed to have standard normal distributions in the same was as in 
Goldstein (1979) who showed that near linear relationships between observed 
ocores resulted. 

As the relative efficiency of an -ins! rumenta] variable estimator is 
-oportiona] to the roi relation villi the predictor, variables are only retained 

- — 58 



for further analyses when thin correlation is greater than 0.3^ Thlh eliminates 
most of the "background" variables but only one of the teacher rating 
(that of outstanding ability in any area at age 11) and none of the test scores, 
leaving 25 variables in all. All cases with missing values on any of these 
25 variables are excluded leaving 5371 cases with test scores at each age. 

i 

In the appendix to Fogelman (1976), Goldstein shows that the attrition of 
subjects in the study doe? not affect to any marked extent the relationships 
. found and Goldstein (1979) finds that test scores for subjects having missing 
values on the background variables show no significant differences from other 
subjects. 

In the results re-ported here, all instrumental variables are treated as sets 
of dummy variables for reasons given in Section 2.4. For the test scores the 
dummy variable coding into five roughly equal size groups ensures that the 
relative loss in predictive efficiency from dummy variable compared to simple 
linear regression is always^less than 6%. 

Using these insttumental variables as dummy variables or as interval scale 
variables in face gives very simlar results,' the maximum difference in 
regression coefficients for any instrumental variable being 27, 



2 Forming hypotheses of error structure in prediction relations 

As in Section ! . 2 we assume that the true score of a test comprises that 
component of test score which is unaffected by day to day variation by the 
particular tester or by the test situation', and is specific to the particular 
test used. We now examine the correlation between the variables to be considered 
as instrumental variables with measurement errors on the first occasion and 
second occasion tests^and with the disturbance term. These variables are 
teacher ratings on a variety of attainments, :est scores and social class. 

The teacher ratings, like the test scores, inj-goneral wi 1 1 contain measurement 
error, thus reflecting and being reflected by variations in the child's 
interest in subjects and day to day variations in the type of relationship 
to. the teacher. Thus a teacher who has very recently seen a child do a 
good piece of work or show a keen interest, may tend to rate him higher than 
otherwise. If the same contributory factors affect test 3 core then a teachet 
rating made at the same time as the test would be expected to have a positive 
correlation with the test score measurement error, likewise where a different 
Attainment is rated at the same time as the test similar correlations may 
exist, although prefainuh ly smaller. • 



er|c 



59 



There arc two variables which we hypotheses will have a zero correlation 
with test score measurement error. These arc teachers ratings taken at 
a different poijjt in time and social class. Wc would expect none of the 
sources of measurment error to relate to teacher rating at a point in time 
4 or 5 years away. Nor would we expect social class, which docs not vary 
much for an. individual over short time periods, to relate to ai.y of the 
sources of measurement error. 

Finally we examine the relation of the disturbances u. , to teacher ratings 
and social class. Either of these variables and particularly social^class 
maybe correlated with the disturbances if they relate to the dependent 
variable once the predictor variable has been controlled for. 

The hypotheses formulated above may be sununarised thus; 

HI. Teacher ratings on a test where the rating is at the same time ' 
as the test, will be positively correlated with test score 
measurement error. 

Teacher raongs on a different attainment from that tested, where 
the rating is at the same time as the tc*t will be postively 
correlated with test score error but to a lesser extent than for 
the same attainment. 

Teacher ratings when the child is at a different age. from that of 
the test will be uncorrected with test score measurement erior 
whether or not the same attainment is tested. 

Teacher ratings are not correlated with disturbance terms from the 
regression of second occasion scoi;e on first occasion score. 
Social class is correlated with disturbance terms. 
Social class is not correlated with test score measurement error. 



H2. 



H3. 

M. 

115. 
116. 




Hypotheses HI, 112, H3 refer to the relation of teacher ratings to test 
score measurement error. Ill, |J2 refc^to teacher ratings at the same 
time as the test and !!3 at a different time. refers to the relation of 

teacher ratings to equation disturbances. ' 115, 116 refer to social class 
and apply' to the relation with equation disturbances and test measurement 
errors respectively. Cenerally, teacher ratings are "held to correlate 
with test score measurement errors only when tested at the same time as 
the tost and not 'o be correlated with equation disturbances. In contrast, 
social class is hypothesised as correlating with disturbances but not with 
test score measurement error even when measured at the same time as the test 

Examining 07X these six hypotheses give rise to the following predictions. 
The hypotheses giving rise to each prediction arc given in brackets after 



|0 _ 



60. 



the prediction. 

PI. Comparing 0 teacher ratings at I I years, the lowest estimate of R 

will occur for the teacher rating of the same attainment 9 (f rom Hj to 

}\^ particularly il 2 affecting r^ ), and at 16 years the highest 

value will occur for the rating 'of the same attainment (from H to 

!L f particularly II affecting r„ ). 
M * Ze^ 

P2. For a given attainment, teacher ratings will give hig or estimates 
of p when measured at !6 than I! years (from || to 11^). 

P3, For a given attainment, teacher ratings will r ,ive higher estimates 
of 3 when measured at 7 rather than 11 years (from Hj to 11^). 

P4. There is no difference in estimates between teachers ratings at 7 
years (from II,, H.). 

P5. Social class gives higher estimates of 6 than teacher ratings at 7 



P6. 



and II years (from H to 11 ), 
I 6 



Social class will give similar estimates of 8 irrespective of the age 

at which measured (from H C ,1L). ^ 

5 6 

It should be noted that these predictions arc not unequivocal tests of che 
hypotheses. For instance, even if P6 holds one could conceive of different 
correlations of social class with measurement error at differf.it ages, 
these terms being counteracted by different correlations with equation 
disturbances. This, however, seems unlikely. 

4.3 Results 

Tables 1 and 2 give estimated regression coefficients for reading and 
mathematics respectively for 16 yea^attainmenl on II year attainment using 
a variety of teacher ratings and social class-as instrumental variables at 
ages 7, II and 16. Using the ungrouped II year test score as instrumental 
variable gives the ordinary least squares estimate, which thus enables 
reliability estimates to be calculated for each choice of instrumental 
variable .by dividing the ordinary least squares estimate by the instrumental 
variable estimate. Each prediction will be examincd-in turn. 



AC 



Estimated Repression Coeffici ents of R,Unr Test at 16 years on Reading Test at 11 years adjusted f„ 
Measurement Error, Using a Number of Instrumental 'Variables Separately 



Instrumental variable measured at: 
Teacher rating of: 

Oral 
Reading 
Number 
Creativity 



7 years 



0.955 
0.944 
0.990 
0.990 



1.057 



Social Class 

Reading Test at 11 (interval scale)(0LS estimate) 0 . 797' 
Reading Test at 11 (5 category groupings) 0.810 



Oral 

Use of Books 
Number 



\ General 
\ Knowledge 



11 years 

0.972 
0.964 
0.974 

0.979 



^Social Class 1.070 



Average standard errors of regression coefficients: Using Teacher ratings at 7 



years 



" 11 



C2 



» IQ II 

Social Class 

Reading Test at 11 years 



English 
Mathematics 

Practical 
Subjects 

Social Class 



0.017 
0.013 
0.015 
0.027 
0.008 



16 years 



1.042 
1.067 

1.047 
1.064 



63 



1 




Estinated Repression Coefficients o f Mathematics Test at age 16 on Mathegatlcs tost at ag e 11 adjusted 



for measurement error using a number o f instrumental variables separately 



Instrusental variable at: 
Teacher rating of: 

Oral 

I Reading 
1 \ Number 

Creativity 



"7 



7 years 

0.883 
0.849 
0.854 
0.911 



Social. Class 



0.994 
0.748 



I'athS Test at 11 (Interval Scale) - 

(0LS estimate) 
**tlS Test at 11 (5 category grouping) 0.763 



Oral 

Use of Books 
Number 



11 years 

0 T 890 
0.884 
0.874 



General Knowledge 0.920 
Social Class 0.991 



16 years 



English 
Mathematics 



0,992 
1.073 



Practical Subjects 1.029 
Social Class 1.025 



Average standard errors of regression coefficients : Using Teachers ratings at 7 years 



M II 



w 11 years 



64 



u 16 u 

m Social Class 

* Mathematics Test at 11 years 



0.018 
0.015 
0.016 
0.030 
0,010 



r 



65 



1 




/ 

This holds for mathematics using teacher ratings botlt at ll'aml 16 

and for reading for teacher ratings at 11 but not 16. 
P2. This holds for comparable teacher rating at M and 16 for botli 

reading and matiicmatic:; test scores. 
P3. T\\is only holds for one out of the six possible comparisons, namely 

teacher rating of "number" for reading attainment regression. / 
P4. This holds for both attainments. « , / 

P5. This holds in all cases. / 

P6. This holds at all ages for both attainments. * / 

I 

It should be noted that predictions PA and P6 specify no differences,/ 
between regression coefficients whereas the other predictions are^p/a 
difference in a specified direction. In fact, for PA, P6 the differences 
between coefficients arc small in relation to the standard errors. 

The predictions are all seen to hold generally with the exception of P3. 
This implies the rejection of 113 or H4 or both. Rejecting H3 implies that 
the differences in coefficients among the different teacher ratings at age 
7 should be similar to those using the thr)c '.orrcsponding teacher^ ratings 
at age II. This is true with one exception this being the rcvcrsalN^j^ 
relative magnitude of the teacher ratings of reading and number between ages- 
7 and 11 for mathematics. * 

The smaller coefficient est. itcs using 7 5 car rather than II year rating? 
could be explained by a correction of II 'yca^ rating with errors in the 
dependent variable,' which counteracts the correlation witji errors in the 
independent variable - the 7 year ratings having lower correlations with the 
dependent variable. 

If M is the sole reason for the failure of P3 this suggests that \he partial 
correlation of teacher rat'ings with social class at II given test score at 
It, would bo higher for 7 yenr teacher ratings than for 11 jear teacher 
ratings,. and this is not the case. 

It seems .then that we should discard social class as a suitable instrumental 
variable due to its correlation with the equation disturbances, and teacher 
ratings at 16 years are positively correlated with test score error at 16 
(from 111, 112) am! probably also with equation disturbances. This leaves a 
choice between teacher ratings at 7 and II years. As 113 does 'not hold we 
cannot be completely content with using the same-attainment 7 year teacher 
ratings, and in addition it ir. not known how highly correlated these are 
With the disturbances. It was suggested earlier, however, that we should 
expect the 11 year ratings toj.e more, highly correlated with the disturbances. 



14 - 



Aa 112 aloo holds, tlic wisest^choi ce would seem to. be the rating or a different: 
attainment (out of mathematics or reading) at ago 7. For the" read i up. attainment 
tins gives a- reliability of O.fll (using teacher ratine of number at age 7) and 
for ma thematic!; attainment gives a reliability of 0.89 us inn teacher rating 
of reading at age 7. In fact the choice between 7 and 11 years for the 
instrumental variable jtoakes a little difference for nratliem.it ics attainment 
Riving a reliability of 0.86 using reading rating at ago II, and for reading 
attainment the difference in reliability estimate is only 0.005. 

The question naturally arises here as to whether the use of tests of the same 
attainment at different ages is necessary in order to obtain reasonable 
estimates of reliability coefficients by this method. Estimates of the 
reliability coefficient of 11 year reading test score obtained by repressing 
16 year mathematics score on 11 year reading score using separately as 
instrumental -variables teachers ratings of reading and mathematics attainments 
at il years, gave reliability estimates of 0.76 and 0.61 respectively 
compared with the value of 0.8! given above. This suggests that the 
disturbance terms in this regression are correlated with the, mathematics 
teachers ratings For the regression of 16 year reading test on 11 year 
mathematics test using teachers eatings of mathematics and reading separately 
as instrumental variables reliability estimates *>f 0.85, 0.68 respectively 
are obtained, compared with the ,valuc of 0.88 given above. Care should 
therefore be taken wnen estimating reliabilities by this method to use similar 
attainments as dependent .a id predictor variables. 



*• 4 Use of grou ped first occasion variable as instrumental variable 

Table 3 gives estimated regression coefficients for both reading and mathematics 
when the first occasion variable is grouped into 2, 3, 5 or 7 equal groups. 

The inconsistency of tlfc grouping estimator (l^) relative to the ungrouped 
(OLS) estimator (b ) is given by 

k - (21) 

b lV~ b OLS 

where b Iy is the insLrumcnt.il variable estimator (using an appropriate 
teacher rating) aiu\ is assumed to be consistent. 



As sugge»te)if in 3.3 substant ial inconsistencies arc imlicaml in those data, 
being crcnu*r with n larger number of groups. Tlius; tlure i* a tracK-off between 
inconsi«UMi(7\an(1 efficiency, the latter being greater as the number of groups 
is increased. \JY>r reading attainment the lowest estimate of reliability, arising 
from the division into two group:;, is 0.97. This is higher than any- of the vajues 
derived from the regrt\ss?«n estimates by instrumental variables methods given in 
Table I. Furthermore, the Estimated regression coefficient is seen to vary with 
(he point of dichotomy. Whcre\s for reading the lowest estimates occurs for both 
extreme divisions, for mathematics the lowest estimate occur when division is at 
the lower end of the scale and the 1 highest estimate when division is at the 
higher end. 

•Tablc__3 E stimated regression coefficient g a»d .standard errors using the 
grouped predictor as instrumental variable. Standar.-j errors in 
bracket*, k is definedin (2 1 ) . . 



Reading k Ma thematics k 

UpgroupcJ (O.L.S.) 0,797 (0.0082) 1.00 0. 748 (0.0097) 1.00 

No. of equal groups 
(o£ equal size) 

7 0.808 (0.00S5) 0.94 0.755 (0.0100) 0.93 

5 0 810 (0.0086) 0.93 0.763 (0.0102) 0.86 

3 0.818 (0.0092) 0.89 0.777 (0.0109) 0.73 

2 0.827 (0.0103 0.84 0.780 (0.0121) 0.70 

Varying position of dichotomy 
Proportion in lower lest group 

°- 2 , 0.810 (0.0012) 0.93 0.687 (0.01/,) 1.58 

0 - /< * ' 0.823 (0.0010) 0.87 0.765 (0.012) 0.84 

°- 6 0.822 (0.0010) 0.87 0.802 (0.012) 0.49 

08 0.789 (0.0012) 1.04 0.805 (0.014) 0.46 



If the assumption is made that the correlations of the grouped first occasion 
variable with the disturbances and error in the second occasion variable arc 
boll, zero then using the reliability estimates Jrom the previous section we can 
substitute in '(26) to obtain the correl.it i on.-; with the error in the first 
occasion va.iable, r These are given below in Table «, and assume the 

reliability »?iina!»e given in Section 3.3 (0.81 and 0.19) are' correct . 



i 



Dlr - 16 



Table 4 Correlations of dichotomised instrumental variable and first o ccas i on 
roeasurcinont error for different division points 





Proportion 


below 


division 


point 






0.2 


0.4 


0.5 


0.6 


0.8 


Heading 


0.280 


0.302 


0.329 


0.305 


0.324 


- Mathematics 


0.390 


0.226 


0.233 


0. 128 


0.120 



Thus the^orrclations, while reasonably constant for reading are systematically 
decreasing for mathematics. We have no good explanation "for this but possible 
causes are non-homogeneity of errors in the mathematics test, or a non zero --=-- 
correlation between true score and errors of measurement. 

0 

4.5 The use of test score as an instrumental variable 

Test scores of reading and mathematics at 7, II and 16 years, a score of General 
Ability at 11 years (with Verbal and Non-verbal components) and a Copying Design 
Test, are considered as instrumental variables, and Results are given in Table 5. 



ERIC 



- 17 - 



69 



Table 5 



Regression Estimates of 16 year attainment test on 11 year a ttainment test in reading and mathematics 



using test scores^ as instrumental variables. Standard errors in bracket* 

Mathematics 



Rending 



Instrumental' variable 
at 7 years: 

Reading Test 

Mathematics Test 

at 11 years 
Reading Test 
Mathematics Test 
General Ability Test: Verbal 



0.920 (0.015) 
1.004 (0.020) 



Copying Designs Test 

at 16 years 

Reading Test 
Mathematics Test 



0.810 (0.009) 
0.995 (0.012) 
0.942 CO. 012) 
Non-Verbal 0.982 (0.014) 
Overall 0.957 (0.012) 
0.989 (0.032) 



1.197 (0.013) 
1.053 (0.015) 



0.822 (0.017) 
0.850 (0.018) 



0.866 (0.014) 
0.763 (0.010) 
0.826 (0.013) 
0.90x (0.014) 
0.860 (0.013) 
0.933 (0.032) 



0.S49 (0.015) 
1.315 (0.017) 



I 

o 
I 



70 



71 



Since short term fluctuations in nLlainment Lo some extent will be correlated 
over all attainments we would expect the nrgiinients and predictions in Section 
2,6 in relation to teacher ml inns to apply to test scores. In all application 
considered here the tent score is divided into 5 grorps, and dummy variables 
used, „ri is satisfied trivially in the light of the results on the use of a 
grouped predictor as instrument al variable. P2 is satisfied, but P3 rs again 
contradicted by the behaviour of the reading tests at 7 and II when used as 
instrumental variable for mat hema tics attainment. 

Gc»nerally, the behaviour of test scores is similar to the teacher ratings and 
the standard errors are similar, giving little indication for preference of 
one set of variables over 0 the other. Nevertheless, if correlations betweeii. 
instrumental variables when measured at 7 years, and errors of prediction and 
of measurement at )l and 16 years were zero, then similar estimates of 
regression coefficients should be found for the tests and tocher ratings of 
different attainments, used as instrumental variables. In fact, for both 
tests and teacher ratings and for both attainments the estimates zrc larger for, 
mathematics than toy reading, when used as instrumental variables. 

The results for general ability indicate that it occupies an intermediate 
position between the two attainments. 



4.6 Separate equations for each social class 

Table 6 gives the estimated regressions^ f ir i ent of 16 year reading test 
score on 11 year reading test score separately for each social class measured • 
at 11 years, fusing I! year teacher rating of number as instrumental variable 
in each case. 



r*ble>6^ Estloated- regression cocfficicnt3 for reading teat a t 16 on reading t est at U for each 11 y ear social class separatel y ' 

using teacher rating of number at 11 as instrumental variable" ~~ 



Social Regression Standard Raw Corr. Estimated M<.« 



Class 



Coefficient Error 



an 



between tr.ue corr. reading reading Cases 

11 2: 16 between score of score of 

yr. scores 11 .& 16 yr 11 years 16 years 
scores 



11 year test sc ore 

Mean No. Re 1 i ab i li ty Van an ce 

of measure- 



ment error 



16 year test score 

Reliability Variance 
of 

ne as ure- 
rent 



Profess- 






























ional 


0.918 


0.084 




0.72 




0.98 


0.741 


0.688 


249 




0. 75 


0. 193 


0 . 72 




Inter- 






























se4iate 


0. 893 


0. 038 




n 77 




u . y*> 


0. 485 


0 . 479 


903 




0.84 


0.138 


0.78 


0.175 


Skilled 






























n oa— manual 


0. 902 


0 f!4fi 

U • UtD 




n 7Q 
U . / o 




0 .97 


0 . 360 


0. 384 


497 




0.80 


0. 1-7? 


6 . 81 ' 


0.142 


Skilled 






























manual 


0.980 


0.023 




0.80 




0.97 


-0.099 


-0.099 , 


2235 




0. 79 


0. 165 




U.I lb 


Semi- 












T 
( • 


















ski lied 
manual 


0.981 


0.037 




0. 78 




0.95 


-0.238 


-0.281 


880 




0.78 


t 

0. 169 


0.S4 


0'. 132 


Unski lied 
manual 


1.037 


0.074 




0.78 




0.90 


-0.567 


-0.458 


282 




0.76 


0. 198 


0.94 


" 0.132 


All 






























(excluding 
no male . 


0.974 

i 


0.014 




0.80 




0.96 


0.046 


0.035 


5046 




0.82 


0.168 


0.85 


0.135 


head) 






























73 


* 

Tes t^for 
Test for 
Test for 


equali ty 
equali ty 
equali ty 


of 
*f 
of 


11 year Reliability Coefficients X 2 = 11. 
11 year Measurement error variances x 2 = 9 
true correlation coefficients X 2 = 5 141 


9 (p <0.*5) 
.5* (p > 0..05) 
.7 (p < 0.001) 






74 


•Test values for these tests are obtained from the robust "chi-square test in Layard (1973) assuming the 
of the measurement errors has a kurtosis of 3. When % kurtosis of zero is assumed the values of x 3 for 
reliability coefficients and measurement errors respectively are 29.8, 23.7 (p< 0.001) s 


distribution 
equality of 




9 

-ERIC 












» 



















With the exception of the "profess) onal" cla.ss the regression coefficients 
systematically dec reuse wi Lh higher social class, The reliability estimates 
of the reading test at 11 years in each social class are seen lo vary, generally 
increasing with higher social class.. The exception to this is in the M pt ofessional 1 
class whose lower reliability may be/explained by the presence of higher 
ineasurcment errors at the top of the reading scale. The measurenent error 
variances vary in an inverse fashion and this table does not provide evidence 
that the Measurement error variance is more nearly constant between social classes 
than the reliability coefficient, as suggested in Goldstein (1979), indeed the 
opposite secins to be the case. The estimated values of the true correlation 
coefficient (p) within each social class is obtained from the equation- 



K ll R I6 



where r is the observed correlation coefficient and R , £ are the estimated 

11 16 

reliability esthetes of the 11 and 16 year te.,t scores, tin' 16 year estimates 
being obtained by using instrumental variable estimates of the regression o r 
11 year on 16 year test scores, the intrumental variable being the teachers 
• rating at age 16 of mathematics. The estimated correlations vary with spcial 
class with the "prof essi onal" class having the highest v"alue and the "unskilled 
manual" class the lowest value. These values seem rather high, but it must 
be remembered^ that the same reading test was used <u 11 and 16 years. A 9-5% 
confidence interval level for the overall value is (0.962, 0.996), 

Table 7 gives the coefficients for the mathematics test, where teacher's 
ratings of the use of books at 11 years is used rs tbe^>n1^umental variable 
for each class. Contrary to the reading scores, the x rcgressi on coefficients 
increase with higher spcial class, . . Tho reliabi , it y 

estimates of the mathematics test at 11 years are seen to vary with a low^r value 
for the "professional" class. The measurement error variances vary in an 
inverse fashion. The extent of variation between social classes of the 
reliability^ and measui ement error variances is larger than for the reading test, 
both reliability and measurement cuor variance estimates showing a similar 
de g roe o f va r i a t i on . 

v, . - 

Hie estimated true correlation coefficients show the same pattern as for the 
reading test, with the values on the "professional" and "unskilled manual" 
classes being respectively higher and lowei than the aveiage value. .Tho 957 
confidence interval for the ovc-r.ifl value is (0.89% 0.908). 



ERiC 



- 21 



75 



STable 7 

i* . ■ ■ 

& 



Social 
Class 



Estimated regress ion coefficients for mathematics tost at 16 dn' mathematics test at 11 for each 11 year social clas s 
separately using teacher rating of use of books at 11 as instrumontal variable 



Profess- 
ional 



11 year test score 



1$ year test score 



Regression Standard Raw Corr. Estimated Mean Mean No. ' Reliability Variance Reliability Varisa© 

Coefficient Error between true corr. Maths Maths Cases of measure- of 

11 fc 16 between score at score at 

yr. scores 11* & 16 yr 11 years 16 years 
scores 



ment error 



1.081 



0. 120 



0.73 



0.94 



0.778 



0.716 



249 



0.72 



0.210 



0.83 



0.146 



Writer- # 
rsediate 



/ 0.920 



0.042 



0.73 



0.88 



0.454 



0.510 



903 



0.83 



0.136 



0.84 



0.133 



Skilled nan- 0.883 
manual / 



-'^Skille/ 
casual 



0.869 



0.052 



0.026 



0.73 



0.71 



0.83 



0.89 



0.330 



-0.063 



0.388 



-0.098 



497 



2235 



0.84 



0.85 



0.132 



0. 118 



0.82 



0.74 



0.153 



0.215 



Semi- 
skilled 
raaual 

Unskilled 
ranual 

All 

(excluding 
no cale 
head) 



0.P24 



0.689 



0.885 



0.044 



0.068 



0.016 



0.65 



0.64 



0.74 



0.88 



0.83 



0.90 



-0.248 



-0.478 



0.039 



-0.226 



-0.475 



0.071 



880 



282 



5046 



0.79 



0.87 



0.85 



Test* for equality of 11 year reliability coefficients x* = 34.5 (p < 0.001) 
'Test* for equality of 11 year measurement error variances ? 

85 2 9-5 <p< 0.001) 

Test for equality of true correlation coefficients x 2 - 46.2 (p < 0.001) 
*See note on table 6 



0.160 



0.101 



0.133 



.0.70 



0.67 



0.79 



77 



0.230 



0.229 



0.194 



5. Summary 'and Discussion 



Our results suggest that ' <li f feri np. correlation;; of instrumental variables with 
measurement errors account for the observed differences in regression and 
. reliability estimates, although social class has a negligible correlation with 
measurement eyior but a non-negligible correlation with the error of prediction. 

The estimated correlation coefficient between true scores on reading and 

mathematics tests at eleven and sixteen years is respectively 0.96 and 0.90. 

The estimated reliabilities using the selected instrumental variable-; (teacher 

rating of an tmrclaCc-I ability at age 7 as the predictor) are 0.81 and 0.88 

fo rereading and mathematics respectively. These compare with the values of 

Op and 0.94 given in Goldstein (1079) by split half item analysis on a subsample 

0£ 3Q0 cases. Whilst the values for^reading are similar the value obtained by 

, item analysis for mathematics is somewhat higher than any obtained for the 

instrumental variables used here, although the difference is of the same order 
/ I ■ > | 

. ^as Hie standard error of the separate estimate!. 

j For reading attainment the estimated standard drror of the reliability estimate 
f obtained by ite-n analysis is 0.030 while £he instrumental variable method gives 
'^0.012. A split half estimate using all available data would have a stanlard 

error of ^bout 0.007. For mathematics the relevant standard errors are 0.020,^ 

0.014 and 0.005. 



Using a grouping of the predictor variable itself as an instrumental variable 
Turves estimates of the reliability which are higher than any obtained using 
other variables as instrumental variables, irrespective of the number of groups 
used. These estimators, we suggest, are not to be recommended. 

The examination of rending and mathematics attainment Separately in each social 
class g.ive different reliabilities for the different social classes. 

o 

.. For both attainments the "professional" social class gave the lowest value but 
Otherwise for the mathematics lest lower reliabilities occurred in the lower " 
social classes and vice, versa for the reading test. Measurement error variance 
estimates varied in an inverse .fashion. Instrumental variable estimates of 
regression coefficients showed an opposite trend for each .attainment, with th/' 
coefficients for the reading tetj being generally Higher in the lower social 
classes .and for l he mathemali cs being higher in the higher social classes. 
■Estimates of the- true- correlation between II and 16 years was different in each 
social class for both leading and mathenutics at l.ii m*-nl -s, with highoi values 
in the "professional" class .and lower values in the "unski J lcd,manual" class for 



^Cflc.Ii attainment. 



ERIC 



- ?3 



78 



Wliilc instrumental variable CHtfmafion has had a loin; history (early papers on 
theory and application in the economic field include:. Wvld (1910), Kcicrsol 
(19A5), Durbin (1953), Sarr.an (1958), Madnnsky (1959)), it ha:; not yet 'become * 
generally accepted as an estimation method in the social and educational fields. 
Sargan (1958, page 396), in discussing the (unknown) correlation of instrumental 
variables with measurement error in an economic context stale:; "It is not easy 
to justify the basic assumption concerning these errors, namely that they are 
independent of the instrumental variables. " Tt seems likely that they will vary* 
with a trend and with the trade cycle. In so far as this is true the method 
discussed here will lead to inconsistent estimates of the coefficients. Nothing 
can be done about this since presumably if anything were known about this type 
of error, better estimates of the variables could be produced, Tt must be 
hoped that the estimates of the variables are sufficiently accurate, so that 
systematic errors of this kind arc small." We have argued that comparisons 
of different instrumental variables, considered separately / can throw some 
light on the error structure in the data and thus lead to better knowledge of* 
the consistency of the estimates policed. Furthermore it is also our view 
that this approach provides a flexible tool for an empirical study of the various 
assumptions needed to produce good estimates. 

Finally, four issues seem particularly worthy of attention: 

1. Obtaining estimates of the standard errors of the difference between 
different instrumental variable estimates (these will be lower than 
those obtained using the individual standard errors and assuming 
independence). This would enable a more careful analysis of the 
hypotheses of the paper, 

2. 'Obtaining good estimates of the standard errors of the reliability 

estimates produced by instrumental var, able methods, 

3. " Examination of the use of *morc than onc instrumental variable" in 

connection with a single predictor in terms of the efficiency and 
consistency of estimates, 

4. The study of differing reliabilities and measurement error variances 
in different groups of variables such as social class, in order to 
incorporate these ii linear model estimate*.. 



KKFKRKNCKS 



ANI»;RSON, J£ 1J 



brown, R h 



< 1980) 



(1957) 



CRONBACH, I. .J, CUSSKR, G C, (1972) 
NAN DA, 11 4 RAJARTNAN 

•\ 

\ DAVIE, R, BUTLER, N R & (1972) 
\ GOMKTKXN, If 



\ 



DURB1N H J 



FDGKLMAN, K 
' GOLDSTEIN, H 

GOLDSTEIN, U 

\ 

t 

JOHNSTON | J 

JORESKO<^, K G 

KENDALL, M G & 
STUART, A 

XAYARl), M^W J 



LORD, F M' £ 
NOVJCK, M J< , 

4 

Mcdonald, r 



MAD AN SKY , A 



HAIR, K R A 
BANERJIW 



(T953) 

(1976) 
(1979) 

(1980) 

A 

(1972) 
(1971) 
(1977) 
(1973) 

it 

(1968) 
(1981) 
(1959) 
(I9A2) 



Comparing Lit cut Di stri burtons . 
Psychomctrika , 45 , 121-134 

Hivari.itc StrucLur.i] Relation, 
Bioinetrika, 44, 84-90 

The dependability of behavioural 

measurements: theory of genernlisabi 1 i ty 

for scores and profiles, 

Wiley, New York - 

From Birth to Seven: a report of the 

National Child Development Study. 

Longmans, London ' 

Errors in Variables, Review of 
Institute of International Statistics 
Vol 22, 1954, 23-54 

Britain's Sixteen Vear Olds. 
National Children's Bureau, London 

Some Models for Analysing Longi tudi.al v 
Data on Educational Attainment. 
' J. Rov. "Statist. Soc. ,A, 142, ;02-442 

Dimensionality, Bias, Independence, 

and Measurement scaje problems in 

latcvt trait test score models. 

Br. J. Math. & Statist. Psychol., 33, 234-246 

Econometric Methods . 
McGraw Hill, New York 

Statistical* Analysis of sets of congeneric 
tests. 

Psycbomctrika, 36, 109-133 

The Advanced Theory of .Statistics . 

Volume 2, Griffin, London ^ 

Robust Large Sample Tests for 

Homogeneity of variances. 

J. Amer. Statist. Assn. 68. 195-198 
t 9 

Statistical Theories of Mental Test 
Scores*. 

Addison-Wesley, Reading, Mass. 

Tb^ Dimensionality of Tests and Items. 

Brit. J. Math and Statist .Psycho] . 34, J 00-1 1.7 

The fitting of straight liivos when both , 
variables ate subject to Error, 
JASA, 54, I7.V205 

A note ot fitting straight lines if both . 
variables are subject to enor. 
Sankya, 6, 131-333 1 



NKYMAN, J & 
SCOTT, K L 



REIKRSOL. 0 



SARCAN, J I) 



WALD, A 



(1951) 



(1945) 



(1958) 



(1940) 



On ccrt.-tin methods of ostium inj; 
the linear Structure] Relation. 
Aim. Math. Statist. 22, 352-355 

Confluence Analysis of Means 
of instrument;)! sots of 
Variables . 

Arkiv. for Mathcmatik, Astronomi 
Och Fysik, Vol 32 

The est i mo lion of K c o now i c 
Relationships using Instrumental 
aV:»'"] ablcs . 

Econometric*, 26, 393-415 

Fitting of straight lines if 
both variables are subject to 
error. 

Ann. Math. Statist. Vol 11, 284-300 



J 



u 



ERLC 



- 26 - 



SlWVRY *• 

= *> 

* ** 

The method of luHrumcutal Variables is suggested nsc.in alternative to 
tradition.nl methods for estimating the reliability of mont.nl test scorer., and 
avoids certain drawbacks of these methods. Thc.consiste»cy. i uul 3, cf f iciency of 
the imtrumeutal variable method arc examined empirically usinj; data from -the 
British National Child Development Study in an analysis of 16 year and U-ycar 
-old scores on tests of mathematics and reading V 

Keywords . ' % [' 

» 

Instrumental Variables, Errors in Variables, Reliability, 'Longitudinal, 
Educational Attainment. 



Acknowledgments 



We are grateful to the National Children's Bureau ior permission to use the 
data find to Doug.nl Hutchinson for his comments on, an early/ draft of the 
paper. The work was carried out largely o« a graqt from"_the National- » 
Institute of Education, Washington (.NIK -C-7 7-0065V. . ' \ 



APPENDIX 6 



SOME APPLICATIONS OF LISREL TO EDUCATIONAL DATA 



BY- RUSSELL ECOB 



Paper read to the Seminar on Structural Equation Modelling with Particular 
Reference to LISREL at the University of London Institute of Education 
' 14th September 1981 + j 

NOT FOR REPRODUCTION WITHOUT PERMISSION 

* \'* • • ! 

SUGARY c 

i 

Exaroplps are given of the use of LTSREL in both confirmatory and exploratory 
modes. In the confirmatory mode a sequence of hypotheses is set up each of , 

which is a special case of the preceding one and is tested sequentially. In 

9 - i 

the exploratory mode the data in conjunction with knowledge of the subject , 
matter is used at any stage to determine which parameter to add to. a. given 
model. This paper emphasises, when used in a confirmatory mode, the importance 
of the initial specification of the model and shows how, more than one inodel 
can be^'confirmed" by the data. In the exploratory mode LISREL is used on. the 
data from the National Child Development Study to estimate the correlation 
between the latent variables corresponding to underlying reading attainments 
at' two ages. Examined also is the effect of different fitted error correlations 
on the correlation between latent variables and the effect on final estimates 
of choosing different indicator variables and of scaling variables to have 
normal distributions. The use of data on reading attainment at three ages 
allows the incorporation of extra latent variables corresponding to test 
specific factors. Finally an alternative method using instrumental variables 
for estimating change in reading attainment is briefly* described and compared 
with the linear structural relations method. 

USE IN A CONFIRMATORY MODE 

Joreskog and Sorbora (1977) describe the procedure thus: 

"Suppose *T represents one model under given specifications of fixed, free 

and constrained parameters. To test the model fi against any more general 

- - " ° 2 * 2 

model h\, —estimate them separately and compare their X . The difference in X 

2 

is. asymptotically ax .with degrees of freedom equal to the corresponding 

ciiffererice in degrees of freedom. In many situations, it is possible to set up 

a sequence of hypotheses such that each. one is a special case of the preceding 

and to test (these Hypotheses sequent ial 3 y". / 

' / '* ' : ■ • i. ' 1 

Worts, Brcliind, Grandy and Rock (1980) apply LISREL to the mensurofue of , 

writing ability ct thrco occasions, by csnav and by a tOBt of standard written 

English at each occasion, /the moasuros being made on American undergraduate^ 



ft 



during thoir first year of study, composition Instruction being given between 
the first' two occasions. They aim to use LISREL to separate out the instability 
or variation in tho "true scores" (underlying or latent ability) over occasions 
from tho measurement errors which aro supposed to be correlated across occasions 
and thus derive estimates of reliability. It is this correlation between 
measurement errors across occasions which makes tho model more general than 
the well known factor analysis model. 

i 

Each essay is marked independently on a scale from 1 to 6 by two outside teachers 
to give a scale from 1 to 12 and the tests each consist of 50 multiple choice 
items. The data consist of 234 out of a total of an initial 2500 students 
tested at each occasion and though it gives rise to a largo potential non- 
response problem is used hero for didactic purposes. 1 The correlation matrix 
is given in Table 1. 

f 

TABLE 1' i 



Test 1 


1.000 r 






i 






Test 2 


0.837 


1.000 










Test 3 


0.854 


0.842 


1.000 








Essay 1 


0.621 


0.640 


0.602 


1.000 






Essay 2 


0.602 


0.636 


0.551 


0.564 


1.000 




Essay 3 


0.596 


0.617 


0.597 


0.572 


0.523 


1.000 



Test 1 Test 2 Test 3 Essay 1 Essay 2 Essay 3 



Their initial (most general) model is that the true scores of the tests and 
essays are linearly related, the same latent variable being measured by both 
indicators, and that tho errors of measurement of the tests' are uncorrected 
across occasions whereas the errors of measurement of the essays are correlated 
across occasions. Wc call this the "essay error' correlation model". 
An alternative hypothesis, not examined by Worts et al , is that the earrors on 
the tests are correlated between occasions but that the essary errors are not 
correlated between occasions. Wc call th'is the "test error corrclat ion" model. 
The further alternative, that tho true scores underlying the tests and essays 
represent different latent variables and that neither of the errors are 
correlated across occasions, leads to an unidentifiable or indeterminate mode]^^ 

having too nany parameters to bo fitted by the 21 independent observed correlations. 
Note, however, that Mock £ Saris (19ftl) provides a rcinlvrprc tation of the Worts l-i al 
data using two latent variables having correlation 0.H9 by making the assumptions that a 
Simplex relation holds between each set of latent variables and that at lea t one of thrc<. 
other "possible restrictions hold. Appendix 2 shows that the model with both sets of / 
errors* correlated across occasions is alao not identified. / f 

O I I 

ERIC • o 

, 84 



- 3 m 



ft 



Tho choi co botwocn tho "essay orror correlation 1 !, and tho "test error 
correlation" raodols cannot bo made empirically as neither is a specialisation 
of tho other,/ tho two models embodying different conceptualisations of the 
true score or latent variable or cquivalcntlv of tho allowed components of 
error. They will bo shown to lead to differont parameter estimates, in particular, 
Reliability estimates, by carrying through the process of Werts et al for both 
models. 

The most general model considered by Werts et al has the restriction on true 
scores that the correlation between the true scores at occasions 2, 3, 



23/1 



= 0 



This is 
and is 



is unaffectsd l>y tho true score at occasion 1. or p 

equivalent to the restriction that c o 0 

I3*"" ' 12' 23 , . 

called the simplex model of true scores. The model is shown diagramniatically 
in figure 1 for tho "test error correlation" case. The LISREL specification 
^pf this model is given in Joreskog and Sorbom (1977), Section 4.2 and also in 
f* Appendix 1 of thrs paper. 

The model with unrestricted error correlations can be reformulated as one with 
ajtest specific factor (see Appendix6.3) and this is shown in figure 2. In testing 
the series of possible models the restrictions on the relationship between the 
true. scores can be viewed as the structural component of the model and we have 
in order of increasing restriction, ^ * 

51 No restriction on true scores 

52 Simplex restriction on true scores 

53 No change in c true scores between occasions i, J 

54 No change in true scores overall 

Similarly considering the measurement component given by the error correlations 
(for a particular test) we have the following possibilities (restrictions apply 
to both tests) : ° 

El No restriction on error variances or covariances 
E2 Equal error variances on two occasions 
E3 Equal error variances on all occasions 
E4 Equal "error correlations between two sets of occasion 
E5 Equal error correlations between all occasions 
E6 Zero error correlations between all occasions 

We proceed by tosting first the structural restrictions in turn and accepting 

j < ? 

a model when tho restriction gives a significant increase in X . Then we > 
test tho restrictions on error correlations in turn on tjic accepted structural 
model. The testing then continues on any remaining structural restrictions 

( 

using tho current orror correlation modol, and then hgaln on the error corrolatjL-on 
restrictions ii necessary. Tlie'proccss stops if allj more restricted models in 



both structural and orror sense/ givo significant increase in X 



! 



Models 



aro not includod as they aro not homogeneous. 

85 



Table 2 shows the decision process for the tost error correlation model. 
All structural restrictions arq satisfied and the final model has 1 latent 
variable with equal correlations of test error across occasions of 0.55* 
Table 3 shows the decision process for the essay error correlation model. 
Here the restriction S4 is rejected in the context of El but accepted in 
the context of E3, the common essay error correlation being 0.21. The 
significant values given by the tests when used in this way cannot be given 
a rigid interpretation in probability terms as restrictions are tested 
sometimes more than once and a given model tested against more than one 
othe"r model* 



TABLE 2 Test error correlation model? decision process 



Structural component 

No restriction on 
ctrue scores 



Goodness of fit 



„ 1.44 



Difference 



Significance Decision 



Simplex restriction 
on true scores 

No change in true 
scores 



i 



U45 



4.73 



1.01 



= 2.79 



> 0.05 accept 

> 0.05 accept 



Mea surement component 
(in context of S3) 
Equal error variances 
on all occasions (each 
test) 



X l0 = 10.02 



2 



= 5.29 



> 0.05 accept 



Equal error correlations 
across occasions 



2 _ 



12.40 



= 2.38 



/ ° > 0, 05 accept 



Zero error correlations 
across occasions 



36,44 



X! = 24.04 



< 0,001 reject 



- 5 - 



Tablo 3 Essay error corrolatlon modol : decision processes 



Structural component 

No ..restriction on 
true scores 

Simplex restrictions 
on true scores 



Goodness °ot fit Difference Signif 1 canoe (p) Decision 



2 

x 3 



« 7,V8 
» 11.83 



2 

X 1 



= 4,65? < 0,001 reject 



No change in true 
scores 



X $ «= 11-94 



X 3 ■ 4,76 < 0,01 reject 



H 0 - 3 s . u rorcn * com n onent 
(in context of SI) 
Equal error variances 

Equal error correla- 
tions <E5) 

Zero error correlations 



i 

X? = 10,95 



X 9 « 11.49 
2 

X, 0 " 34.94 



3.19 >. 0.05* accept 



1,46 



Xi = 23,45 



> 0.05 accept 
< 0.001 reject 



Structural* component 
(in context of E5) 



Simplex 0 restriction on 
true scores 

Ncf change in trufe 
scores (S4) 



Measurement component 
(in context of S4) 



2 



« 12', 85 



^12 



13,08 



= 1.35 > - 0,05 accept 



2 



i. 



0,23 > 0,05 accept 



Zero error correlations X l3 = 36.44 
(as final model in Table 2) 



Xj « 23,36 < 0,001 reject 



IERJC. 



87 



Kbto t)mt tbo initial "essay error corrolation" has a bad fit to tho 
data whereas tho "ttist error correlation 1 ' model fits tho data adequately 
in its most general form. 

Tho final models are thus different for both forms. For the "test error 
correlation" ropdol the variances of Jthe test and essay errors are 0.34, 0.45 
v respectively and tho reliability of tests and essays are 0,66, 0.56 
respectively. The "essay error correlatio.n" model has values. 0,16, 0,56 ' for 

the variances of tests and essay errors and 0,84, 0,44 for 

2 2 

the reliability of tests and essays (reliability is defined as R = 0 /o 

2_ 2 _ 2 X A ' 



where 0 Y 



variance of observed scores = 1 in our 



case due to use of correlation matrix, = variance of true scores, 

0 e = variance of measurement error). 

The estimates of error variances or equivalently of reliabilities are thus 
dependent on the initial modelling framework, 

Verts et al justify their initial decision on the "essay error correlation" 
. model by reference to cited research which includes evidence that good 
handwriting leads to higher essay scores regardless of content. It would 
be equally possible to defend the. "test error correlation" model on the basis 
of subject specific reactions to the testing process. Indeed a whole day 
of a recent Symposium (4th International Symposium on Educational Testing, 
Antwerp, June 1980) was devoted to one possible mediating component, test 
anxiety. 

Chang e in Attainment Over time r Exploratory and Confirmatory Analyses 

tISREL is now used in the analysis of the change in reading attainment between 
ages 7 and 11 and between ages 7, 11 and 16 on data from the National Child 
Development Study (N'CDS), Davie et al, (1970). This study followed up, about 
17,000 children "born in one week of March 1958 at' ages 7, 11 and 16, An 
obvious limtation on the use of LISMvL to which attention is drawn by Joroskog 
and St>rbom (19J7) is the requirement that'the measures used actually measure 
the latent traits or hypothetical constructs. The analysis of the V.'erts et al 
data and Appendix 3 shows that assumptions which wo make on the nature of the 
correlation between tho errors of measurement can bo viewed as- part of the 

definition of tho latent traits, * * * 

-r -* 

On the NCOS data this form of analysis roquires thrt wo have'a number of reasonabl 

measures at each occasion of tho trait in quootion. 

> 



Tho following measures of reading attainment from. the NCDS study are considered 
and listed here with a code, . _ 



MEASURE 



CODE 



VARIABLE NUMBER 



Age 7 Reading Test (Southgate) 
Te ache r^Ra ting of reading 

/Teacher rating of child's 
y^present standard on reading 
^ s cheme 



RTST7 
RTR 7 

RSTD 7 



1 

2 



Ago 11 Reading Comprehension Tes-t RTST 11 

Teacher ratings of the Use 

of Books U of B 11 

Toacher rating of Oral Ability , OTR 11 

Verbal component of General 

Ability Test • "VGAT 11 



5 
6 



Age 16 Reading Comprehension Test 

Teacher r&t-iig of English 
ability 



RTST 16 
ETR 16 



Whilst we have three seemingly valid measures of reading attainment at age 7 
only the first two of the 11 year measures have face validity, the Oral 
teacher rating and tho verbal component of the General Ability test perhaps 
measuring different but related skills. The teacher rating rf English at 
age 16 may also lack face validity. The same reading test was used at ages 
. 11 and 16. O 

c 

Desirable* qualities in the data and thei r preli minary examination 

The maximum likelihood estimation procedure of LISREL requires for consistency 
of estimates that the data are multinormally distributed. Under these conditions 
the variance-covariance matrix between variables represents all the information 
in the data as all moments above the second are zero. Multinormality can be 
tested by various methods. These are well reviewed by Bock (1975) and Gnanadesikan 
(1977) and further methods given by Cox and Small (1978) and Barnot and Lewis 
(1979). 



Multinormality requires both that marginal distributions are normal and tha^t . 
relationships betweon all variables are linear, (though it is not implied by 
this). In practice data usually differ from this" ideal situation, for infitance, 
due to thof relationships botwoen^ variables being non linear or duo to thr» variables 




not being marginnlly normally distributed. In thette rfases variables may bo 
translormed so that ono or moro of theso conditions hold. (The references 
givon abovo rofcr as much to theso transformations as to the testing of 
multinormality , tho two issues being closely rolatcd). It is also conceivable 
t^at' linear rolations occur between latent variables even when they do no^ 
occur between tho indicators. However this requires that the relationship 
between the indicators and latent variables is non linear. This cannot be 
modelled in LISREL tVoi ,ff> h it can be modelled in other programs (of McDonald,. 
1980 1 and Clogg, 1977) 1 and this allows the use of categorical variables/as 
indicators of continuous latent variables. Alternatively, non-lin 
analysis (McDonald 1967) allows the factors or latent variables to 
linearly' related. 

In the present data c tho aim was to use, initial transformations to render the 
data as far as possible linear. The reading .test at 11 years was scaled to 
have a normal distribution and the reading test at 7 years old was scaled to 
have a linear relationship to the 11 year score, the non-linear relationship 
for the raw data being interpreted as due to a ceiling effect on the 7 year 
scores. This scaling reduces but does not eliminate the skewness of the 7 
year distribution. Scaling of the 7 year score to have a' normal distribution 
would give a non-linear relationship between the scores at the two ages. 

Q 

The teacher ratings, rated on a scale from 1 to 5 aro viewed as cafcegorisat i r>ns 
of an interval scaled variable and so the distributional aspects are considered 
relevant. The distributions generally differed from normality in having a 
negative kurtosis (except Oral TR at 11 years) and a transformation to 
-normality incroasod the moro extreme values. Such a transformation also 
changes the relation to the tost scores. Thus it is not possible to examine 
the effect of ^uch a transformation whilst retaining the relationship between 
all the variables on the present data. However analyses are presented with the 
ratings scaled to normality (OSc) and unadjusted (0) in order to examine the 
sensitivity of tho parameter estimates and the particular parameters uhich are 
freed to this transformation. 

o 

Table 4 gives the correlations between the variables for both variable sets. It 
is seen that the transformation to normality of the teacher ratings reduces 
most of the correlations particularly thoso involving tho teacher rating of 
Use of Books and tho toacher ratings of Oral ability at 11 years, increasing 
nono of tho torrolations significantly. 



1, Also Muthen and Dalquist £l980) 



ERLC 



( M) 



- 9 - 

i 

TABLE 4 

MATRICES OF CORRELATIONS FOR SKLKCTBD SUBSETS OK KCDS READING ATTAINMENT DATA 
DATA SET 



R TST 7 




1.000 














R TR 7 




0.738 


1.000 








* 




R STD 7 




0.705 


0.635 


1.000 










R TST 11 




0.632 


0.614 


0.530 


1.000 








U OF B 11 




0.607 


0.601 


0.511 


0. 380 




1.000 




VO TR 11 




0.534 


0.542 


0.409 


0.618 




0.707 


' 1.000 




R 


TST 7 


R TR 7 


R STD 7 


RTST 11 


U 


OF B 11 


0 %R 11 


R TST 7 




1.000 














R TR 7 




0.742 


1.000 












R STD 7 




0.622 


0.599 


1.000 










R TST 11 




0.632 


0.616 


0.461 


1-000 








U OF B 11 




0.497 


0.494 


0.386 


0.562 




1.000 




0 TR 11 




0.458 


0.468 


0.355 


0.537 1 




0.616 


1.000 " 




R 


TST 7 


R tR 7 


* 

R STD 7 


RTST 11 


U 


OF B 11 


0 TR 11 


R.TST 11 

Q 

R TR 11 




1.000 
0.747 


y 

l f 000 






V 






R STD 11 




0.713 


0.705 


1.000 










R TST U 




0.638 


0.616 


0.546 


1.000 








U OF B 11 




0.618 


0.60 


0.534 


0.684 




1 . 000 




V GAT 11 




0.680 


0.635 


0.577 


0.745 




0.663 


1.000 



R TST 11 RTR 11 R STD 11 RTST 11 U OF B 11 VGAT 11 
is as •'O 1 ' except tho last column is deleted 



- 10 - c v 

5 

Tho linearity of the rolations between vajrlabl.es was examined lor sets both 
with" transformed and non-transfor; ud te.iclrr ratings by examining the contribution 
of tho squared terra to \he regression of each variable on the 11 year test 

score. Although all bi>\ 1 coefficient was significant (on a sample size 106, 

2 — S 

the increase in R duo "to" tttting this term. was always less than 0.02 being 

greatest for the scaled data for Oral Teaching Rating (0.009) and for the 

Verbal General Ability (0.008) an'J for the non-scaled data tor Reading Standard 

teacher rating (0.013) and verbal^gener^l ability (0.008), Except for the 

Reading Standard teacher rating at 7 years the scaled teacher ratings had less , - , 

linear relationships to the 11 year test score than the non-scaled ratings. 

* 

Further examination on unscaleddata taking a random sample of 500 cases and 

fitting tVrms up to x 5 in the regression gave non-significant cpefficfents 

except in the regression of General Ability, test at 11 on the reading* test at 11 
2 

where the x coefficient was significant at the 5% level. In a regression of 
teacher ratings v on the General Ability test none of the higher order coefficients 
were significant. Overall then the degree of non-linearity in the data was not 
considered subjtantial though the lower correlations of many of the teacher * 
ratings with tho normally <fystributed 11 year" test score when transformed suggest 
the prescence of some degree of non-linearity at least in the transformed data. 

The face validity of the teacher ratings of Oral ability and of the Verbal component 
of the General Ability test is examined by runs of LISREL on the 7 and 11 year 
reading attainments with each of these indicators included separately (datasets ^ 
"0" and "v") and also with none of them-(dataset "2" - 2 indicators only at 11 
years). If all the other indicators are suitable then the suitability of these 
indicators will bo judged by the similarity of the parameter estimates for the 
data sets which include them and those which do not. These consist of the 
correlations between latent variables at different occasions, the correlations 
between errors of measurement and the variance of the errors or, equivalently , the 
reliability of the measures. « 



EMC 



92 



Finally tho largo sample size results in a^signif icnnt lack of fit for models 

wtflch would* fit for sir.alJcr sample* sizes and thus using the criterion of 
« 

overall fit to solect the appropriate model sometimes results in a xarge number 
of parameters being fitted. One way round this problem is to use a nominal » 
sample size (say 1000) as the basis on which to compare tho models. 4 As well as 
'giving, the fit of each model to the data, the value of the sample size at which 
the model fits at the 0.05 level is also given and when this rises to above the 
nominal sample size this procedure would choose this as the f^nal model. 

*In addition we are interested in the extant 'of change in the estimated 
"correlation between the latent variables as we proceed through the model choice 
process. This is affected by the jparticular error correlations which are 
different from zero* Append}x6.4 gives thc^-appropriate "parameterisation of this 
problem in the LISRKL formal, in opder that error correlation's between occasions 
can be estimated. Figure 3 describes the model *nd Appendixes briefly describes 
and justifies the model choice procedure used, that of freeing the parameter 
with th* largest first order derivative. First, exploratory analyses of the 7 
and 11 "year data are presented. Then the 7,^11 and 16 year data is examined via 
models which utilise both exploratory and confirmatory approaches. 

Exploratory Analyses of 7 and 11 year data 

Table 5 gives tho model selection process for^each data set. p (= ) is the 
estimated correlation of the latent variables between occasions and n Q 05 is the 
sample size at which the solution reaches tho 0.05 significance level. 

• * ' * 

In the column' labelled "largest first order derivative 1 ' /'largest residual" are the 
variable numbers involved. 

It can be seen that when a positive error correlation between occasions is fitted, 
the* correlation between latent vanities is reduced and vice versa when a positive 
error correlation between variables at the same occasion is fitted. (For example 
data*set OSc models 1 to 2, 4 to 5). The opposite effect occures when negative. \j 
error correlations are fitted. In *5 out 14 occasions the largest first derivative , 
occurred between tho same variables as did the largest residual. 

• c • 

The estimates of p for each of the nonscaled data sets,. 0,821, G~829, 0.822 are 
qiyite similar, and more similar* than are found using a nominal sample^ size of 1000. 



TABLE 3 ■ PROCESS OF MODEL SELKCTION ON THE FOUR DATA SETS FROM' N CDS 

2 V 



at* Sot Ordor Qf modol Fit (X ) 
selection 



0" 



p value n 



0.05 



P 



OSc" 



2" 



largest ,fi rst 
order derivative 



largest 
residual 



1 0 


2 

X 




492.0 




337 


0. 


819- 


5.6 


1.4 


2 


2 

X 

*7 




187.0 




805 


0 t ,842 


1.2 


3.6 


3 


2 

X 


= 


26.0 




5177 


0. 


828 


2.6 


1.6 


4 


2 
X 5 


= 


' 15.1 


0.01 


7852 


0. 


8ia> 


3.6 


3.6 


5 


2 
X 4 


= 


5.4 


0.25 


18773 


' 0. 


821 






1 


x 2 




938.6 




177 


0. 


819 


5.6 


5.6 


2 


V 2 

X 7 




106.1 




1420 


0. 


845 


3.4 


3.4 


3 


2 
X 6 


= 


38.1 




3533 


0. 


852 


6.2 > 


6. .3 


4 


x : 




27.1 




4334 


0. 


850 


5.2 


6.3 


5 


x 2 

4 




15. 7 


0. 004 


645? 




V 


5 1 

O 


o • «? 


6 


3 




7.6 


0.04 


1097S* 


0. f 






1 


X? 

. 8 




233. 7 




540 


0. 


856 






2 


x 2 

7 




180.0 




638 


0. 


850 


5.6 


3.4 

4ft 


3 


■4 


= 


96.2 




1066 


* 0. 


843 


1.2 


3.4 


4 


o 

X 




19.3 


0.002 


4681 


' 0. 


827 


2.6 


2.6 


5 


< 




4.8 


0.31 


16090 * 


0. 


829 






1 

2 


4 




158.7 
5.3 


' 0. 15* 


637 * 
15740 


• 0. 
0. 


845 
822 


1.2 


3.4 



able 6 gives the estimated error correlations and Table 7 gives the estimates of 
©liability. 



ne nature of the orror correlations jJetueen the : measures at ane 7 and the first two measures 
Of age 11 are seen to be little affected by the third test at age 11. However, the scaling 

f^thc teacher ratings markedly affects the error structure, giving a larger proportion of 
errqr correlations between occasions. 1 

. ( 

imilarly the reliability estimates of all other measures is little affected by the inclusion 
r the choice of tho third test at ago 11. The- scaling of the teacher ratings however 
. duces the reliability estimatos of thb toachor rating and also affects ihose.of the tests. 



13 



ABLE 6 ESTIMATES OF CORUELAT IONS ttKTWKKN KRRORS OK Mr ASUHi:*TNTS FROM FINAL MODI' 1«S T N TitR 



0" 



yt. 



-FOUR DATA SKTS 



R TST 7 



•OSc" 



II TR 7 


-0.45 


1 




0 


1 






R^STD 7 


0 


0 


1 


0 


0 1 






R TST 11 


0 


0 


0 


1 0 


A A 1 A 

0 —0. XH 


1 




U OF B 11. 


0 


0 


0 


0 1 -0.04 


0.07 0 


0 


1 
X 


0 TR 11 


-0.07 


0 


-0.03 


0 0.28 1 0 


0.06 0 


0 


0.21 1 


R TST 7 


1 






"2"~ 1 








R TR 7 


-0.36 


1 




-0.45 


1 






R STD 7 


0 


0 


1 


0 


0 1 






R TST 11 


0 


o • 


0 


1 ~ 0 


0 0 


1 





U OF B 11 
V GAT 11 



0 

0.04 



0 0 
0.01 0 



0 
0 



0.23 



ABLE 7 RELIABILITY ESTIMATES FROM FINAL MODEL ON THE FOUR DATA SETS 



TST 7 


R TR 7 


RSTD 7 


R TST 11 


U OF B 11 


OT Rli 


VGAT 11 


.83 


.79 


.59 


.71 


.66 


.55 




.77 


.72 


.50 


.74 


.42 


.38 




.84 


.80 


.59 


.70 

f 


.66 






.83 


.79 


.62 


.74 


.67 




.79 



0" 

OSc H 

2 M 

V" 



These results give some reassurance that the inclusion of the third indicator has not 

arkedly affected the latent variables at each occasion, though of the two o.rtra indicators 
the General Ability test has slightly more effect than the Oral teacher rating on the other 
aranciert* in the* toodcl* Thoy f«l«0 HUflfteftl that each wonHUro is u valid one. 



n addition of extra indicators provides extra information which reduces tne standard errors 
f the estimates and bo under these conditions they should both be "added to the model. The 
tandard error of the, correlation between the latent variables at 7 and 11 years is 
oducod from 6*006 to 0*005 by the addition of the oral toachors* rating. 



7 



ERIC 



or 

v>0 



- 14 



Analysis of data on WC1)S nt thrco occasions 



The exploratory analysis Just considered Is useful to the present analysis in two 
respects. Firstly it suggests that either or both of the Oral teacher rating and 
the General Ability test can be used as an indicator of the 11 year reading 
attainment and secondly it suggests that the teacher ratings taken at age 11 have 
correlated errors and that these also occur between the reading test at 7 and the 
teaching rating of reading at the sane age. t The full path model for the relations 
between UI6. latent variable at each occasion but where the errors are concerned 
uncorrected is shown in fig.:re 4. The correlation coe f f i cients p. and the 
conditional correla'.ion coefficients p 



ij/k 



where P Ji/k is the conditional 



correlation between the latent variables at occasions i, j given the value of 
occasion k are related to ^ g ^ aria 0 This model <M3) is a saturated model for 
the structural aspects of the model as all possible parameters in the model are 
estimated. The likelihood ratio test of goodness of fit of this model gives X 2 = 
1090.7 17 



Two alternatives are now possible, either to elaborate the model i„ an exploratory 
sense by freeing error covarlanccs corresponding to the highest first order derivatives 
or some other criterion or to hypothesise some particular structure, on the errors 
corresponding to one or more test soeciflc factors. We have seen (Appcndixi3> that a 
test specific^ factor is a reformulation of a sot of non zero" error covanancos when 
3 Indicators have mutually correlated errors. It is more restrictive thou K h more 
easily interprotable when more than 3 indicators have mutually correlated errors. 
One important restriction of model, of this type is that no test specific factor for » 
all the three or less indicators al x particular occasion can be hypothesised a* 
this would not. allow the relation between the latent Variables at different occasions 
to be uniquely determined. However at is natural to chose one test specific factor ' 
f6r all tests and another for all teacher ratings, these being cons, dered orthognal. 
This model, MS, described in figure 5 gives a X 2 = 185.6, a substantial improvement 
in fit. To what extent is the error variance of the indicators accounted for by this 
model? Table 8 gives the error variance of each indicator showing the proportion 
accounted for by tho tost spocific factor. 



ERiC 86 



i# TABLE 8. 



CONTRIBUTIONS OF TEST SPECIFIC FACTOR TO TOTAL ERROR VARIANCE 



Total orror Test factor contribution Remaining 
variance to overall variance variance 



Proportion of 
variance accounted 
for by Test factor 



16 



j&l Tests ^ 
IfRTST. 7 

#>RTST 11 
IItST 

^VGAT 11 

g *Teacher 
Rj jjratings 

MvXofs-B 11 
[iRTR 16. 



0.177 
0.258 
0.254 
0.267 



0.321 
0.407 
0.383 
0.403 



0.006 
0.130 
0.076 
0.002 



0.138 
0.019 
0.011 
0.009 



0.171 
0.128 
0.178 
0.269 



0.183 
0.388 
0.372 
0.304 



\ 



0.04 
0.51 
0.30 
0.01 



0.43 
0.05 
0.03 
0.02 



I^Thus a large proportion of the error variation in the reading tests at ages 11 and 16 is 
^accounted for by the test specific factor corresponding to the tests and a large 
j^|>roportion o!f the error variation in the teaching rating of reading at age 7 is accounted 
ffe^for.by the test specific factor corresponding to the teacher ratings. The proportions 
jjSlfor the other indicators are all low. The correlation between errors on the reading 

■^Jtests at ages 11 and 16 fitted by this model is 0.392 and corresponds to the artefacts 

„ . jg** , r ^ 

^introduced into the data by using the same reading test at the two ages. 

^Examining the first order derivatives from the full path model with un correlated errors 
|?f|M3) it is this error covariance which has by far the highest value and freeing only 

iH#£V * 2 - 

Sethis covariance gives model M4 with X,_ = 362.0, again a substantial improvement, the 

' • ' Id »• 

'ejstinfate ^of the error correlation being. 0.416. . This is larger than the previous value 
f£¥&8 there are now no constraints on the test specific factor due to the other tests. , 

% * * * * ' 

^Examination of the loadings ) for model ^15 shows that at each age the ratio of the 
Slowest to the highest is greater than 0.85 -suggesting that each indicator is given a 
■gllmilar weight in determining the latent variable. At 7 years the test is most 
^influential and at 16 years least influential. 



I gls the fit good enough ? 



"Although the goodness of fit of each of these models is highly significant the sample size,/. 

||8189 cases, is very largo. Bcntler and Bonnet (1081) argue that models can be compared 

r^within and between studies by using an index of incre mental fit which gives the relative 

[fefv ' — — 

^improvement" in the fitting, criteria from one model to another on the same data. A 

where k, 1 are the compared, f 



-^~*?d index is the norrood fit index ft 

1ERIC '. 



< F k - F l) /f 



97 



^cipdels and o is -a suitablo null modal and F^ is tho value of the fitting criterion for 
t^modol i. A sui table null modol in our case is takon as consisting of 1 common latent 
Invariable for all tests on all occasions, A possiblo alternative is the model M3, 



|lT;Tablo -0 gives tho set of models described earlier together with the simplex model 

If. described in tho first section (model H2) and the 1 latent variable modol M and gives 

7 2 1 * 

l^values of % for each modol. 

TABLE 9 GOODNESS OF FIT AND NORMED INCREMENTAL FIT INDICES FOR A RANGE OF MODELS 

w r — — 1 — : ~ 7 r 

Ifc * MODEL (i) 
$¥M'. One latent variable, 

It 

IfcSn;;' < Simplex true scores 

W . ' 

Full path model 

IT 

Sf? : m: : M 0 with one non zero error 
pf * ' correlation 

S^ ,; °5 M 3 with two test snecific 
fc factors 



I 



te/nu I* with four error 
s-J-. 5 correlations 



2 

GOODNESS OF FIT (X ) 






4- 


5230.0 




<? 




1253. 5 




0.760 




1090.7 




0.791 


4* 


263.0 • 


t 


0.930 


4 - 


185.6 




0.964 


4 - 


7.7 




0.999 



For model m the value, C.964; for Q is amongst the highest given in the examples 

Ii /given in Bentler and Bonnet and suggests that this mcdeT could be considered a final 
jT 1 model. V 

%: . ■ 



^However by successively freeing error covariances in the context of model m g using the 

maximum first order derivative as an indicator of which to be freed we arrive at a 

miz^ - 2 

II, final model m_ having four extra correlations between erroK giving X = 7.71, p> 0.1. 

% ' 6 5 

?». The first error eovarlanca to bo frood Is that between the readin&test at 7 years and 

|^ the Reading Standard rating at 7. This gives slightly higher first order derivative than 

|'„ that between the 7 year test and teacher rating of reading indicated by the high 

|g, correlation between the errors ( in these variables in the earlier analysis. Other freed 

f /error correlations are botwoen tho Verbal Gonoral Ability test and the toachcr ratings 

IL at 11 and 16 and the correlation between the reading standard rating at 7 and teacher 

rating of English at 16. 

jr. 

-Effect of model choice on parametor estimated 

If Table 10 gives estimates of p and tho reliability estimates of each 

ij/k 

indicator when tho test specific factors havo boon' included in the true score for mo'dels 

I*pM3, M4, M5 and M6. 

BERIC . : • 98 



TABLE 10 ESTIMATE S OF CORRELATIONS BETWEEN LATENT VARIABLES AND R E LI ABILITY OF 
r**'o INDICATORS FOR 3 MODELS " r 



mi 







U 


M 4 


M 5 


M 6 


A 




.848 


.858 


.867 


.861 






.968 


.956 


.957 


.959 


A 

PSI 




.777 


.794 


.794 


.805 


P 31/1 




.989 


.881 


.887 


.880 


P 31/2 




-.756 


-.175 


-.247 


144 


A » 

- P • 




.840 


.555 


.608. 


.u30 


f Reliability 


(RTST?) 


.78 


-.78' 


.83 


.78 




(RTSTll) 


.78 


.86 


.88 


78 


•1 


(RTST16) 


.79 


.85 


.83 


. 82 


II 


(VGAT11) 


.71 


.74 


.73 


. 86 


II 


(RTR7) 


.72 


.72 '** 


.82 


. 56 


tt 


( -ISTD7) 


.63 


.Do 




. 80 


If 


(U OF Bli) 


.61 


.63 


.63 


.65 


%i 


<£TR16> 


.65 


.69 


. /9 


. .71 



JThe fitting of test specific factors is seen to affect all the parameters, especially. 
J'jtfib conditional correlations which have more reasonable values. The standard errors 
£>*of the estimated correlations are of the order 0.02. ' 



' An alter native to the LISREL approach: Use of Instrumental Variables 

The^maximum likelihood estimation procedure of LISREL is equivalent to the least squares 
estimation procedure and gives optimal estimates only when all the variables are normally 
^distributed. However, we have seen that there is a. conflict between this requirement and 
|£Jiat of linear relationships between the underlying variables. The procedure of 
^instrumental variables, applied to a similar data set by Ecot and Goldstein* 39ol) , avoids 
^ this problem by producing an unbiased estimate of the correlation between "true scores" 
?6n the tests say at ages 7 and 11, under certain conditions. Ecob and Goldstein argued 
;that these generally hold when the 16 year s'coro is regressed on the 11 year score for 
Hosts of reading and mathemati c/ attainment. The reader is referred to^Appendixifi for 
{details of the^ differences between the two estimation procedures. r 



It should be noted that all the models considered in this paper are conditional 

models in that the latent variables arc related by the structural equation 

*VY£ f £ t **nd tne parameter y cjofines the expected value of ^ ! 

for given, unobserved, £ ♦ F N or the unconditional model, the structural part of 

the'-model is T} - y g 1 and this is equivalent fixing the covariance, \\> , 

2 I ■ 12. 

in Appendix 4 in terms of ^ ) p 4* 22 as ^12 = ^11 ^ 12« Tne choice 

between conditional and unconditional models in the analysis of change is 

discussed by Bock (1975'j, Goldstein (1979a, b), and Plewis (1981). Taking 

the two teacher ratings at age 7 separately as instrumental variables ue obtain the 

0 

following estimates for the test score reliability at age 7 and of the correlation 

0 

between lest scores across ages 7, 11; « 



Instrumental Variable 



Estimated Reliability 
of 7 year Test 



Estimated correlation 
of test "true scores 11 
at ages 7 and 11 



Teacher rating of reading 9 
ability N 

Teacher rating of reading 
standard 



0.78 



0. 76 



0.81 



0.84 



The values for the correlation of true scores between ages and of the reliability of 
the 7 year test are comparable to the correlations between latent variables produced 
by the structural relations approach of LI3REL. 

The estimate of the standard^error of the correlation is higher (0.01) using the 
instrumental variable approach than the values (average 0,.005) using the structural 
relations approach thus reflecting the extra information used in the latter approach. 
Formal estimates of the regression coefficients of second occasion true score on first 
occasion true score and cf its variance under a variety of structural equation models 
can be* found in Joreskog and Sorbom (1974). 



The results of the linear structural relations approach may be used to aid in the 
choice of appropriate instrumental variables: a correlation between errors on the 
teacher rating of reading and the reading test age 7 is indicated which would lead to - 
a correlation of observed teacher rating with test score error which gives yise to an 
underestimate of the correlation of reading attainment across ages. As non-zero error 
correlations involving t^e reading standard rating are fitted this may be a more 
appropriate choice as instrumental variable. 



Finally the models M5 and M6 at theno occasions are compared with an instrumental 
variable analysis of Goldstein (1979a; in Table il. 



\ 



* ERIC , 



\ 



100 



TABLE 11 OttiPAl USON OF LATKNT STKUCTI ' RE AND INSTRUMENTAL VARIABLE RESULTS OS READ ING 
ATTAINMENTS AT AGES 7, 11, 16 ' - 



Model M5 n 1.084 - _ p.}46 r ? 

j O0.«t>25) # (0.0^) 



Residual Variance 



0.06 



Modol M6 r 16 = *.029 _ 0.080 r y 

(0-022) — (0.018) * 0.005 

Instrumental 
variable 

.. method (IVM) r, fi l # I13> ' 0.147 r 5 

(0-059) (0.051) ' , 0.30 

o \ 

The lower residual in the latent structure ...models is partly due to the contribution of -; 
measurement errors in the 16 year test whicl. is taken into account. The coefficients 
* in M5 and IVM are within sampling error but M6 gives lower values for each coefficient. 

Possible extensions to include more than one attainment and experimental or background 
variables c ^ ) 

The structural equation moaellinr can be further extended to the more complex cases 
considered by Goldstein (1979a)of incorparating social class and examining relationships ' 
between reading and mathematics attainment at different ages. The models used here have^ 
been solely solely concerned with the <y f 7] ) part of the LISREL model and Social Class 
would be induced in the (x,£ ) part possibly measured with error using more than one 
indicator. An example of this form of analysis is given by Wheaton et al (1977). 
However, one netftf* to be aware of the assumption made in these LISREL models that the 
errors in the indicators are uncorrected with the background variables. Results of 
Ecob (1979) suggest this is not true for this data. Hauser (1972) gives a model which 
allows for a direct, : ef feet of background variables on indicators. 

The extra attainment can be modelled by considering more than one latent variable and 
.specifying ih.e loadings of some indicators on some latent variables as zero. An 
alternative method of examining the effects of Social Class on change in reading 
attainment is to consider tho initial attainment measures as fixed variables also in 
the (x,g )'part of the model, ^rhis loses the advantage of modelling test specific 
factors over occasions and we htfvc no parameter corresponding to the correlation between 
latent variables over occasions. However, the interpretation given to the effects of 
Social Class is improved. Joreskog and Sorbom (1977) give models similar to those in 
this section. 4 



ERIC " , 1.01 



' TPnrnott; fcT..towi<i (X978> Outliora *.n Statistical Data, Hew York, Wiley 365 pp 
P M Bentler & P G Bonnet (1981) Signif iennco'tests and goodness of .£11 in tho 
analysrb r "cTc*c7arianco structures .Psychological bulletin 8B, pp 588-600 
H. Dlok &-V Saris (1981) Using longitudinal data to estimate reliability: a new 
To^kTrTh^"*data""of Yferts. Breland, Grandy & Rock. (Internal* Tlepor t) . 



D Wk^l975) Multivariate Statistical Methods in Behavioural Research, 
New York McGraw-Hi 1 1 . 

c'ciott (1977) Unrestricted and Restricted *rt^*™^™^££T 
S^s: a ianual for users". University Park, Pennsylvania: Population 
Issues Research Office, Working Papers, 1977-9. 

Eds: A Goldber^er & D Duncan, Academic Press. ■ \ 

p g Cox, ft N J H S mall, (1978) Testing Multivariate normality. Bioraetrika 

65, pp 263-72. 

R Davie. N Butler h B Goldstein (1970) From Birth to Seven:^ A report on the 

• National Child Development Study. 

• r Ecob (1979) Discussion contribution to H' Goldstein (1979) JRSS, A.142 

. R Eeo b & H Goldstei n (1981) Instrumental Variable methods for the estimation 
X of .test score reliability (submitted for publication). 

R C nanadcsikan (1977) Methods for Statistic^ Data Analysis of Multivariate ^ 
Observations. •' New York: Wiloy, 311 pp. 

Hf Golds tein ( 1979a) . Some models for analysing data on Educational attainment 
"Tupder oTscu^.ion). JRSS A.l_42, pp 407-442 

II Goldstein < 1979li) The design and analysis of longitudinal Studies 
Academic I'res's , Xondon. ^ 

r M Hauser (1972) Disaggregating a social-psychological model of Educational 
■attainment. Social Science Research. JL, 159-218. 

K G Joreskog (1977; Structural equation models in the social sciences: 
7p e cific"a\Ton,.ostimation. and testing in "Applications of Statistics". Ed. 
P R Krishnaiah, North Holland. 

K G Jorc skQK and A S Golclb orger (1975) Estimation of a model with multiple 
. indicators and multiple causes of a single latent variable. JAbA, Vol.7, No. 351 
, pp 631-9.' 

j^n_Joros kop and D Sorbom (1974) Some regression estimate? useful in the 
measurement of change. Research report 74-2, University of Uppsala. 

K£JL<£g$ °^ n ±JlJ^l2™- < 1977 > Statistical Models and methods for analysis 
of longitudinal data. In "Latent variables Jn soclo-econoini c*modcls" . 
Eds. DJ Aigner and A S Goldberger, North Holland. 

KG Joresk og and D Sorbom (1978) LISREL IV: Users Guide. 

R P McD onald (1967) Non linear factor analysis. Psychometric monograph No 15. 



ERLC 



102 



R P McD onald (1980) A simple comprehensive model for the analysis of covarianco 
structures. Somo romarks on. applications. Br, J, of Maths & Statist, Psychology 33 



B Muthen fc D Dalqulst (1980) LADI-A, Latent analysis of dichotomous indicators,- 
Version A, Usor's'Guide, Department of Statistics, University of Uppsala, 

IPlewis (1981> Analysing Change: Using longitudinal data for the explanation 
and measurement of change in the social and behavioural sciences. Final Report 
to the SSRC, \ \ 

D Sorbom (1975^ Detection ol correlated errors in longitudinal data, British 
Journal of foathemntical and Statistical Psychology, 2_8, pp 130-151, 

D E Worts, H M Brel a nd, J Grandy and n R Rock (1980) Using longitudinal data 
to estimate reliability in the presence of correlated measurement error. 
Educational & Psychological Measurement ,* 40, pp 19-29, s 

B Vheaton . B Muthen/D Alwin & G Summers (1977) Specification and Estimation of 
panel models incorporating reliability and stability parameters. In D R Heise (Ed) 
Sociological Methodology 1977„ ^ 



Figure 1. 2 Indicators of 1 latent variable (true score) at each of throe occasion s 
model for V/crts et al (1980) data 



'13 




tests 



essays 



Simplex model for true scores; errors on one set of indicators are correlated, 
N0TE: T1 *e ^ore general structural model has a coefficient g 3 relating r , 0 and- n, 



independently of n % giving the following matrix for g, 



10 0 
-\ 1 0 

% ~*2 1 



ERLC 



104 



Figure 3 A model with 2 occasions and 3 Indicators at tMich oc cns i o n : 1 3a t on t 

variable at each occasion 




Fitrurc 4 



c a c h occasion with uncorrclatod errors 




•APPKfrfitX&l ~ LISREL SPECIFICATION pfr WKRTS gfvAlT MODEL 

t 

Thb simplex true score model used is formulated as- g-« A^T] -j- £ 



PR- C 



where 



§- 



1 o o 
-P, i o 



11 = 



0 0 tp^s 



.J: 
C3 



X, O'O 

o x 2 o 
0 0 x 3 

X + 0 0 
0 XjO" 

o a x„ 



3. 


£ - 


"e, ' 


0 - 
- c 


"©„-. 




\ 1 


b* 








°« 






y> 




£ S 










y+ 








0- 


0 


0- e M 


a* 








0 


0 


0' 0 e» 






Eft 




0 


0 


0 '0 0o ( 



OWo ^methods exist fpr fixing the scale of the latent variables at each occasion. 
One" method is to fix the loadings of one indicator -at *ach age. Here we adopt the 
| alternative solution of. fixing an indicator, \ x , for the first occasion latent 
^yariatfle/Tj^^ , and fixing f B z to fix the scales of u \ t in terns of 

s 'the scale "of T| ( . . \ 




^.We then use the standardised solution which sets the ^variances of each latent varia 
ito'l to obt.ain tho correlation between latent variables at each occasion* The error 



^variance-of indicator 



is then equal to ( 1 



;X? 



) where X is the 



^loading on the latent variable. 



Control card listing, parameter specifications, LISREL estimates and standardised 
^solution follow for the most general "essay error correlation" Mofcel. 



V . .^ . II f-'T" " "M* 

x <o identified but not the model witn ^ 
tests or between errors) is identified d - 

error correlations. „ v 



nation oetween latent variables and the loading X - « * 
TaKing the essay error correlations to be.zero we have 0,,-^- . . 
the situation, in the earlier model. 

. ,.-„ the covariances between 9./^/ 
We, proceed to use the cov ^ and see that it 

and y + .y ? to obtain a value .f or .Vcr (If,,) and V** 



being known. 



X,X + *" Var XV 



depends on 
We have 

Cov (^y + -> 

Coy <y„y 5 > - <M.+ E, ) (X*»*e«>-> 

. ' .Coy (y.,y s ) = < ty, + e 4 > < ^ > = 

, \ - « * C and as Cov ( % , C. V"= Cov ( ^. . > =° 

We have also T| t - M, + bj, 

>Cov< ^ >=Var ( >. Similarly Cov < ^ £j 

and i» £ixed at i; ° 



So we obtain Cov ( 9 , , •» > " ^ Var ( ^ } ^ 
Cov( 9l , 3% >- X. Va r (V » © 
CovC^ V~ \X, Var PI, > © 



• tl „3bV the product of equations 1 and 2 we obtain Var ( 1, ) and " 
Dividing equation 3 by tne prouu^v 

■ ZKXvo-KK respectively. M«« is not known 

^insertion in equations l, s 2 give 5 

th en equation 3 beco.es . X ^N*r 

♦ ^-solved Moreover there are no other equations 
and the 3 equations cannot be solved. 

relating X, , X^ and X, . 

K i Ves X fa and gives overidentiiication 
Repeating this process with JJ,. ifc, Jjo . B ives 
of Var (1J,), X«j andCov,(U 5/ y t > - M« Var ) 

giving Var Hi). > . Cov ( y ) or by Cov 

X..X, are found by . <* / M 

^ r«v ( u IU > giving again over idenifi cation, var mi, # 
<.y.,yo >» CoV ( > B ' „„ ft o <v.. are found 

/ x >) VarfaW and finally & 0 (J1 %j 

• * Cov < >. Cov „ » ». - • ^ 2 ^ civiM 

Ljf\l^ 4 parameters, Var (T), >,A, A 2, ^> # 
^U>.l^.^flfn»d^ toBting. the model. 110 1 _ . 



APPENDIX/; 3 REFORMULATION OF THK ERROR COKRElJiTION MODLT. WITH A TKST SPECIFIC FACTOR 
Lot us consider a model where the indicator which has correlated errors instead has 

■ c 

loadings on a tost specific factor (Tf^), the errors being then uncorrected. The 
other indicator has no loadings on this factor. 



We then have 



where 



i 

x, o % \, 



o \j o x; 

0 0 X 3 X, 

x + 0 o cr 

0 X> 0 0 



0 0 



x 6 0 















p = 


1 


0 


0 


0" 


* = 






-1 


•1 


0 


0 








0 


-1 


1 


0 








0 


0 


0 


1 


V 





ft 



A, > A ARE F|*E 



• \j is i'ixed and the identification process is identical to that given in Appendix. 
2 ajiart from Cov < y, y L ) , Cov < y, y> ), Cov ( y t y 3 ) 

Now Cov ( i},^) = \X Z Var ) X ? X 6 Var <T) 4 ) . , 

and as X lf X 2/ Var C , )/ ^7 are known we obtain X $c Var'(H 4 ). 

Similarly Cov ( y> ), Cov ( y, , y 3 ) give X^Var ) , X^var ) and 
by a similar process to the initial identification in the earlier model we 

obtain values for each of these quantities; 

So,, for the case whore 3 indicators are correlated this is identical to the 
existence of a test specific factor for these indicators, the loadings being the same 
when the covariances are equal. Note that an error correlation between only 2 
indicators does not imply the existence of a test specific factor as the two 
parameters involved cannot be determined by the one covariance and where more than 
three indicators are present overidentif ication of this factor will result. 

As the model with error correlations between both sets of indicators is unidentified 
so also is the equivalent model with a test specific variable for each type of 
indicator. 



Ill 



APPENDIX* 4 



LI5REL SPECIFICATION OF 2 OCCASION MODEL WITH 3 INDICATORS AND 



1 LATENT VAUIAHLE AT EA CH OCCASION 



The model following allows for correlation of errors between occasions and has 
iiq x variable, and' is identical to that in Sorbora (1975) and to the 
confirmatory factor analysis model, Example 5 in the LISREL Manual. 

x. o" 



y\ = Z 



-Xj-0- 
Xi 0 

o x + 

0 X, 
0 \ 



X,,X 4 ;nxED ^ at iy 




is the correlation of the latent variable between occasions. 
It is .important to note that the LISREL User's Guide (1978) gives the following 
model for changp in ability between occasions (Section III. 3 and Example 7). . 



y — Xj 1) +• E 
x X/ + 6 



= Y§ *X 



y. 


X = 


X," 


ia 8 


"1 ■ 


















y* 




It 










>. 















T] E AR£ LATENT Wt£>LE£> 



However, this only allows error correlations within occasions to be fitted. 

Control card listing parameter specifications and fitted estimates follow 
for the final model for data sot "O" . 



A PPKNDIXftS , MppEl^gip_T_CE_ PKOeSDuIlK IN AN EXPLO RA TORY SITUATION 

The literature of Joreskog and Sortom, is confusing on the recommended procedure 
for model choice in this situation. Joreskog and Sorbora (1977) stato 
"In a more exploratory situation the x 2 goodness-of-f i t values 
* can be used as follows. If* a value of x* is obtained 

which is large compared to the numbers of degrees of freedom, 
the fit may be examined by an inspection of the residuals , 
JL ^e_ 1 ^the discrepancies betw e en the observed and the reproduced 



variances and covariances. The examination', in conjunction 
i J; with subject-matter considerations, may suggest ways to relax 

• the model somewhat by introducing m' re parameters. The new 
4 model usually yields a smaller yf . A large drop in X*» 
\ compared to the difference in degrees of freedom, supports 

\^ . the changes mude. On the other hand, a drop in X 2 which is - 

close to the difference in .number of degrees of freedom indicates 
that the improvement in fit is obtained by capitalizing on 
chftace." 

However, in Joreskog (1977) we find the same paragraph except that the "underlined 
wprds are replaced -»y "by an inspection of the magnitude of the first 
derivatives&of F with respect to the fixed parameters." In fact earlier 
literature, Costner and Schoenberg (1973), Sprbom (1975) points, to deficiencies 
in the analysis of residuals rising simulated 2 "occasion data of the type used in our - 
second example which ^include non zero correlations between errors within occasion 
(Costner anc* Schoenberg) and within and between occasions "(Sorborn) , this shows that 
the iterative procedure.,which frees the largest residual at each stage 
results in the incorrect parameters being freed. Costner and .Schoenberg find that 
the correct model is found by an analysis of the set of submodels which exclude 
at least one indicator at each occasion, and s Sosbom finds that an analysis of the 
first order derivatives gives the correct model. The latter method is used here 
as it is much more economical. However, as Sorbom mentions, the freeing of the 
parameter with largest first order derivatives will hot give the largest decrease 
in x 2 as another parameter with greater change in its value between models could 
theoretically give a larger decrease in X 2 - Some idea of '/nether the models found 
- are the correct ones can bo given by comparing the order of freeing the first 
derivatives with the- ordering 0 f the freed correlations in each data set. 

In "O" none out of a possible 6 order changes occur, 

in ;iGSc M 1 out of a possible 10 and in M V" 4 out of a possible 6 occur. 
This gives some limited confidence that the best model with the given number of 
^■X reod parameter* heen found. However, imposing the final model of "0" on 

"(fee" gives a lower x 2 (X 2 s 15.2) then tho equivalent model in fact chosen (y* =~15.7) 
Qf~ one of the paramo tors freed ,(5, 6) was tho same in each case. 

eye ii3 



APPENDIX 6 6 THE LINE AR STRUCTURAL RELATIONS (LSR) AND INSTRUMENTAL VARIABLE (Ij /) 
ESTIMATORS* COMPARED * '~ 



IV 



LSR 



* Estimation method 

*? 

Condition for 
consistency estimates 
of regression (or 
correlation) coefficient 

iEffect of correlations of 
measurement errors between 
occasions on the regression 
coefficient. ■ 



Least Squares 



IV* 8 are not 
correlated with 
errors of measurement 
or disturbances j 

None 



Maximum likelihood 



Correct model is chosen, 
and. observed variables 
are jointly normally 
distributed / 

•None, if incorporated 
into the model.. 



Effects of correlations of 
measurement errors within 
occasions on the regression 
'coefficient 



Inconsistency of this 
leads to a correlation 
between the observed 
instrumental variable 
and test score ■ 
measurement error. 



None, if incorporated 
into the model 



Actibn possible when there 
are rion linear relations 
between normally distributed - 
variables 

Efficiency of estimates 



Effect on variance of 
regression cbef fj^t«v$of 
increase in nupfe-r-of tenets 
at either occasion 



Transform independent 
variable to ?. linear 
relation with dependent 
variable 

Dependent on the 
'correlation of 
instrumental variable(s) 
with independent 
variable - generally 
ttubon/.lnal 

If all used as 
instrumental variables, 
variance generally ( 
decreases 

(See Ecob & Goldstein, 
1981) 



No action possible to 
give consistent estimate 



.efficient if variables- 
are normally distributed 



Variance generally 
decreases (See Joreskog 
and Sorbom, 1974) 



114 



APPENDIX 7 o 

ESTIMATING T Hi: INCONSISTENCY Of INS^ RUMKNTAL VARIABLES KSTfMATKS IN TUH 

CASK 'OF CONGENERIC VARIAB LES WITH CORRELATED ERR ORS 

By Russell Kcob 

The three variables, the independent, dependent and instrumental variables 
are represented as congeneric variables with correlated errors, this being 
known" as a reformulation of the test specific latent variable (see Appendix 6) 
The dependent variable, however, contain!. two error components, one being 
a disturbance term in the regression on the indpendent, variable which is 
assumed to be uncorr.elated with the error of measurement of the instrumental 
variable. ' 

Let x , x 0 , x_ be the observed valued of ,the independent and instrumental 
variab) as ^respectively. 
Then we have 

c 

x 3 = fai - *r 

'where * o , ' J * '/ J * , £ e 3 u o - and j£f u = 0 

and £ (e<) ~o , G- (*•*) =-o 

The true scores , 6, , -at occasions 1 and 2 arc given by 

tu ' fat 4 * \ 

giving the following relation between true scores; , ( « , 

or •+ °* where ^ ^ is tne re C r ession 

coefficient in the relation between the true scores at the two occasions. 

Let us denote the correlation between errors , Cj on variables x^ and 

by ./bj and let the reliability of x, be . 

The Instrumental Variables estimatc^of the regression coefficient of on 
is 

— - ~« -«* 



- 2 -. * 

and tt3 ff. . Jik^ nftor somo 

simple algebra wo obtain 



and the relative inconsistency, T - - /&v - given by —A, 



whore f\ ,< .- , | (— ^. / _ ^ 



When A, f have the same reliability, II 



For thin? partii ular model the inconsistency of, *ihe instrumental variables 
estimator is given m terms of the unknown correlation of indicator errors 
and reliabilities. However thiT modei^is similar to the structural 
relations model, v/here an extra indicator is required *or_the dependent 
variable, the true indicatorJ of the independent variable being X, ow/ 

* i ■ ■ 

Two of the instrumental variables used in Appendix 5b, the verbal component 

of the General Ability test and the teacher, rating of oral ability are used 

simultaneously as indicators of the 11 year reading attainment. Though the 

instrumental variables analysis uses each of these indicators separately as 

instrumental variables, the comparability of estimates of formed by correcting 

the separate instrumental variable estimates using the error correlations and 

reliabilities estimated from the structural equations model will provide an 

indication of the consistency of the two approaches with each other. , It is 

known also that fitting non zero correWc.i between errors of particular 



9 

ERIC 



126 



indicators will^affoct otlior fitted error correlations in tlio model. Thus 
Other indicators may obscure a non-zero correlation between two particular 
indicators. The effect of this is examined by forcing particular error 
correlations to be freed early in the fitting process. 
The following is a. list of the indicators used; 

1) m Test of reading at age 11 

2) Test of reading at age 16 

3) Oral teacher rating at age 11 (IV/- 1st instrumental variable) 
jt 4) Verbal component of General Ability Test at 11 (IV 2) 

5) English teacher ratings at age 16 



As the 'analysis in Appendix 6 suggested, the largest error correlation by far 

is that between the two tests (1*2), Freeing the error correlations in terms 

of their highest yV^t order derivatives freed (1, 3) and then (1, 4) tfc/'s 

2 

giving an acceptable fit to the data (X = 2,6) and a fitting first (1, 4) 

1 - 
gave a^ f ain (1,, 3) as the next 



r freed parameter producing the same model (Model 1), * 

Alternative models wore obtained by fitting either (2, 4) or (2, 3) after 
(1, 2) feiving the freed error correlations for adequately fitting models for 
Model 2 between (1, 2), (2, 4) and (1, 3) and for Model 3 between (1, 2) (2, 3) 
and (4, 5), 1 



sThe estimates of the parameters for the different models is given in the tablvi f 
be low^og ether with estimates of the inconsistency, I, Estimates of 
are tben^bbtained using the instrumental variable estimator of 0.989 and 

* \ - 

1,014 of IV1 aftd IV2 given in Appendix 5b. 



4 - 



Model 1 



i> Error 
^Correlations 



1.2 



, 102 



1.3 .041 

1.4 -.017 
2,3, - 
2,4 

2,5 



Model 2 



.091 
.039 

.016 



Motif: J 3 



. 0<>?. 



-.03^ 



^Reliability 
Estimates 



Test 0.73 
IV1(3) 0.70 
l'V2(4> 0.62 



0.74 
0.70 
0.63 



0.78 0 

0.75 

0.59 



I 

i 



IVl 



IV2 



A 



-0.016 
1.005 * 

0.008 
1.006 



-0.015 
1.004 

0.008 
1.006 



-0.011 
1.006 

0.000 
1 014 



The corrected estimates of ^ corresponding to the different instrumental 
variable estimators are very similar in Models 1 and 2 but differ more 
in Model 3. The Models 1, 2 also produce similar estimates of A . 
The .estimates .of correlations between the underlying variables at each ' 
ape were found to be 0.981 using the corrected instrumental variables 
. estimator using estimates of error variance from the structural equations 
analysis and 0.97G directly from the structural equations analysis. 



.Bye 



Missing dat-a in the NCOS: the use and evaluation of a method of Hcnlo & l-ittle 
' for estimating partially missing respons <* • : 

. « , B y Russnoll Ec oh 

The Prohlem 

The KCDS study was by most standards remarkable for the high follow up rate 
of respondents at earlier stages, 872 of the original birth cohort providing 
at least partial responses at 16 years, 91% at 1 1 -years and at 7 years. Were 
those who did nor respond < at later stages different in any ways from the 
overall sample, particularly 'in ways which wotfld' affect itoeir 16 year score? 
Goldstein in Fogolman (1976) addressed himself to this question and found 
a 9 tendency for non respondents at 16 years who responded at earlier stages 
to come from disadvantaged groups at ages 7 and 11 including illegitimate . * 
children, those who received special education and those exhibiting 
"anti-social" behaviour in school, these categories providing 3 - 6% of the 
children in the pnevey.' 

• In addition when those children with 16 year data were compared with those 
without, biases were found in county of residence, type of accommodation, the 
direction of bias being such that the proportion of children in categories 
associated with lower school attainment is underestimated when only children 
with some data at 16 years are considered. The exception to this rule was 
an over-representation of children from small families among those with no 
data at 16 years. * 

•Goldstein also carried out analyses of the change in test scores between 7 and 
H yeais for those with data at these ages and with and without data at 16 years. 
The analyses considered the 7 year score as a covariate'and either considered • 
theresponse contrast between those with some data at 16 years versus those without 
or considered these categories separately in the latter case including Social Class 
or number of children in the household as a factor. 



Sfc • 119 



2*.- ' 



Whilst the rcsponso/hon response contrast was significant for the mathematics 

attainment no difference in regression coefficient of 11 year on 7 year score • 

• c 

for •ithor mathematics or -reading was found between those .with , some datrv 

•» 

compared to those with no data at 1G years .though a difference between non 
manual and Social Cl^ss v- children* was increased for reading and mathematics 
by 3 a "^| 2 * rcs P cctivcl y» Extrapolating the rcsponsc-non-rcsponsc contrast 
for mathematics to 16 years score gave a bias of 0.05 years of attainment .when., 
estimates using the available data arc used. An explanation for the significant 
rcsponsc^non-rcsponsc contrast for mathematics attainment not being shown up 
in a difference between regression coefficients ia the "response at 16 years" 
group and the total group is the small percentage, 9ft, who Jiad no response at 
16 years but who responded' at 7 and 11 years. 



We now ask how relevant such an analysis "is to the problem of non-rcspons- in more 
complex analyses of the NCDS data which examine relationships between a number 
pf variables at each age. Samples are the analysis of Goldstein (197ii) and the 
instrumental variable estimation described in Cw^cr ?- Z . Both these c 
analyses require data which have responses on both attainment tests, teacher ratings 
and social class at each of the three'ages 7, 11 and 16. This gives sample sizes 
down to 5100 cases, less than a third of the overall sample and much' less than 
f the- 8900 cases examined in Kogelman (197G) with relevant responses at 16 years. 

All techniques which estimate the non-response bias need to make the assumption 
that, given the information recorded, a non-respondent on a particular variable 
is equivalent to a ^spondent. This is called the "missing at random" assumption 
by Marini, Olson and Rubin (1980), It can be seen that any description of 
differences for instance, between children with and without 1G year data arc 
restricted" to the variables measured at earlier ages. Moreover the analysis *-hich 
examined response bias at 1G years included onl- those children who were observed 
on the variables included in the analysis at ages 7 *and 11 and oven where oifly 
. attainment tests were considered, excluded 47% of, the total sample. The 
ERIC assumption was made that these 53% wore the same as the rest of the sample in 



tho scores nt 7 and 11 yc:ir», » ' 

* * , ' 1 • * 

Iteforc^ describing an analysis* which conceptualises missing .values more Generally •• 
wo present analysis which examines tl.c difference between '16 year respondents " 
and non respondents where the "10 y»ar respondents « gPouphavc responses on both 
reading and mathematics attainments 'at 10 and "both groups have responses on these 
attainments at 7 and 11 y P ars in addition to a measure of Social Claos at 7 years. 
This analysis differs from that of Goldstein reported earlier in 'that out of" 
• £2864 having the responses at 7 and li years only 9420 (72.75) have responses . 
at lO years on both attainments (this sample being' similar to that in Goldstein 
«(I979), Tables 2, 4). " , * 

< 

> . Table / give's the regression copfficicnts at 11 year on 7 year scores fo, both 
% attainments and the fitted constants cont ending to social class at 7 years 
- when those with and without 16 year data are included. 



TABLE / REGRESSION OF 11 YEAR ON 7 YEAR READING AND MATHEMATICS 

ATTAINMENTS SEPARATELY WITH AND WITHOUT 16 YEAR SCORES 



II on 7 Yr Regression 
Coefficients 



Social Class Fitted 
Constants 



Reading test 



Mathematics 
test 



All data 

1 (f yea . 
respondents 

o?ily 

' 7 
i 

i 

All- data 

' I6|ycar 
respondents 

on£y 

I 



12864 



9420 



128G4 



9420 



0,61 



0.G0 



0,55 



0,54 



0,ltf -0,08 -0,25 



0,19 -0,09 -0,25 



0,24 -0,10 -0,37 



0,25 -0,12 -0.40 



EMC 



121 



/ 



/ 



— 4 - \, >■ 

Both 7 and* 11 year tosts arc siandnriscd so the repress ion coefficients are 
, " * partial correlation coefficient:; when Social Class (in 3 categoric^) is 
. * controlled for. The recession coefficients differ by 1-2% in the two croups and 
t!hc constants, fitted to social class categories differ by 7-10%. These 

% i - x* " ' ' 

difXcnci/ccs trc much larger than those given in Fogclmaa (1076) and raise the 
' question whether larger differences would be found if more data were excluded. 

A wore general conception of missin g va lues 
. * • „ 

i We now consider a missing value pattern. A particular pattern is obtained by 
the set of variables on which the values are miss ^ and given a set of variables 
of interest a variety of patterns will .occur. For instance in tfoo previous 
analysis just two patterns were considered, complete information on ail selected 
.variables versus missing information on one or more 16 year attainments only. 
Given . a selection "of n variables , the total number o£ possible missing yalue 
patterns is 2 n though not ai: of these will always occur In general more 
variables that are considered' the mo/e likely the "missing at random" assumption 
is. to hold as differences on other trarJVnlos the cases .which are missing on a 
given variable ^re fouhd. * , . « 

* Thcjtealo and Little method £ar estimating missin g values 

Bcale and Little (1975) examine mcthod« fo*- estimating missing values whi^h for 

any missing value finds an estimate by the regressidn on the variables having 

known values. The estimates are in turn taken as known, values for the 

estimation of further, liissing values. The prooess is allowed to iterate until 

, convergence occurs.. Six methods are compared, 5 of which are maximum likelihood 

^ in some sense, 1 which usjff only ordina-y least squares estimation and 3 of 

which use ^combination of both methods where firstly missing independent variables 

are fitted by modified maximum likelihood using, the „ independent 

variables only and then' a dependent variable is fitted by weighted least squares 
t 

on the estimated values. The acthod used in this paper is Met hot! 6, one of the 
last 3 methods where weights are estimated from -the data. A more straightforward 
method usin^ maximum likelihood cstirantioo wl^ich does not require iteration is 
Q given by'Mnrlui, 0 # Jscn and Rubin (1980) which requires that the missing values 

ERIC 1 joo 



be nested so tliat no cases occur which both have variable x miising and 
variable y present and vice versa. This is considered lfkely to apply to 
longitudinal* data where there is gradual exodus frpm the study but does 
not occur in the NCDS study and so -this method is not examined further, 
though non-nested data/qan/beT considered as V lying between two nested 
bounds by excluding some of the values. "In addition the method of Marini 
Olse'n and Rubin requires normality of distributions of variables for the 

* desirable properties of the estimates to be shown, this not being necessary 
for the least squares method of Bealc .and Little "(1975) . 

*■** 

Examination of the Beale and Little method in NCDS data : 

The,Beale and Little method produces consistent estimates of the first 
and second moments of all variables in' a dataset if the missing values are 
missing at ranoo^; irrespective of the -distributional aspects of the data. 

c 

We now focus on the effect of using different, sets of ancilliary variables for the 

m * ~ c 

c 

estimation of r relationships between certain 'key' variables. The fact that the. 

* relation of 7 to 11 year reading and mathematics tests differs for the 

' respondents and non-respondents at 16 years implies that given this set of 
variables the 16 year test data are not 'missing at random'. So for the missing 
16 year data using only 7 .and 11 year test data to derive values for 16 year 
data will give biased regression coefficients of 16 on 11 year data. However. 



9 

■ERIC 



'123 



using additional variajjlcs,. social class and teacher rating measured at each age, 
we have extra information on individuals wfYn^ono .or more tost scores missing at age 
16 and in some cases even data at this age, *** % ^ 



It is likely that for the cases having observations on these extra variables 
this information can be used to reduce the inconsistency in the estimation of 
parameters pertaining to the missing values. Two factors are likely to limit the • 
usefulness of this method. Firstly the dependancy of the* M mi.ssincness M on variables 
which are not included in the ancilliajcy variable set. For instance the non-res- 
pondents at 16 years were found to be over represented in small families. If 
this variable has an effect on the 16 year scores after controlling for the 
ancilliary variables then estimates will be inconsistant. Secondly the 
ancilliary variables are often themselves missing in~many of th* observations 
with missing values on the 16 year scores,, 0 F>r these cases the ancilliary 

variables can only tfc used indirectly through the preliminary estimation of the 
missing values of the ancilliary variables through other variables. 

However, the use of social class and also measures of attainment at 16 years as' 
ancilliary variables is expected to control to some extent for the 

dependency of the "missingness" of the test data on other factors. 

The following analysis examines the effect of the* internolation tff missing values , J 
using various subsets of ancilliary variables on various estimates of the relation i 
of 1G year to 11 year and of 11 year to 7 year tests of reading. 
These values are compared to those obtained by using only cases with values on aH 
these variables and using cases with values on all variables involved directly in 
estimation of the parameters associated with 16 year and 11 year tests and the 
relationship between thinn. 



The data and tho intoning value patterns 

10 variables wero included, these being tests of. reading attainment and teacher. 
\ ratings of reading and Social Class at each of these ages 7, 11 and 16 and also 
a test of General Ability at 11 years. The key variables were the reading tests 

' — 

at ages 11 and 16 and the teacher rating of reading at age 11, other ^ ^ 

variables being ancilliary . Excluding cases with no observation on any of these 
variables left 17070 cases. 231 out of the 1024 possible missing value patterns 
were found and of these 5 each accounted for greater than 3% of the data and a 
further 12 for between 1 and 3% of the data. These are given in Table2 where 0 
denotes the value of this variable being missing and 1 present. 

o 

TABLE 2 Missing value patterns accounting for greater than 1% of the data 



test 7 
! r " ' 


tr 7 


sc 7 


test 11 


general 
ability llj 


tr llj 

- . i 


sc llj 


test 16 


tr 16 


1 

sc 16! 
! 


f rcq 
uencyj 


> 3% 1 


f 






1 


77 1 


1 

•f 


i 

1 


1 


1 


1 

1 * 


6114 










1 




1> 


1 


1 


1 


0 


1795 










1 




1 


1 


0 


0 


1 


923 










1 




1- 


1 


* 0 , 


0 


0 


1378 










0 




0 


0 


0 


1 


0 


577 


<>3% 1 








1 




1 


1 


1 


0 


1 


205 


but>l% i 








1 




1 


1 


0 


. 1 


1 


256 








1 




1 


0 




1 


. 1 


437 










1 




1 


0 




1 


0 


255 






1 fl 




It 




1 


0 




0 


0" 


195 








0 


0 




0 


1 




1 


1 


334 . 








;'0\ 

. 0 


0 
0 




0 
0 


0 , 

0 - 




1 
1 


1 

0 


301 
176 






0 


1 


1 




1 


1 




1 


1 


216 


0 


0 


0' 


1 


1 




1 


1 




1 


1 


341 


0 


0 


0 


0 


0 




0 


0. 




1 


1 


236 


0 


.0 


0 


0 


0 




0 


0 




1 


0 


166 



35.8 

- } 

10.5 j 

8.1., 
3.4 A 



1.2 
1.5 
2.6 I 
1.'5 
1.1 

. 2.0 ; 
1.8 
1.0 
1.3 
2.0 

1 -4 

1.0 



13827 81.6 



17070 100 



125 



3 v 



- 8 - 



Only 50.5% of the data conform to patterns where all data at a given ape are 
either present or absent and so much information on missing values is generally 
provided by other data collected at the same occasion. 28% of the data has 
information missing only at 16 years and 19.8% had observations on teachex 
ratings an'd or on the general ability test which were iiot on the reading test 
of that age or vice versa. 

The Beale and Little program v/as run on a random sample of 919 cases, this being 
near the maximum Sample* size within computer memory limitation, "it was 
found that a reordering of the data on the three test variables to bring similar 
missing value patterns together in, the ordering improved the performance by~a 
factor of'0.3 and so this was done for all runs. Estimates were obtained for 
the full data of the mean test scores on the reading test at 7, .11 and. 16 years 

c 

and the mean values of the teacher ratings of reading at 7 and 11 years. Also 

obtained "were the ordinary least squares and instrumental variable regression 

coefficient of 16 year on 11 year arid 11 year • on 7 year* reading tests and the 

derived reliability estimates of the reading tests at 11 and 7 years as outlined 
hi i. 

in Z. , The instrumental variable* used in the two regressions were 

the teacher ratings of reading at 11 years and 7 years respectively. 

> 

Ten runs- were made. The first two, numbers 1, la^did no,t use the missing values 

program and c ave sstWtes rc =poctivoly for the date cots vith complW-vhlucs 
of all key and ancilli:±ry variables * , . 

and for the data sets with 

.complete values of the key variables only. These are called the complete case* 
and "key" datasets respectively. Datasets 2-9 gave estimates^using different 
values of ancilliary variables generated according to the following pattern. 



I KK . 



126 



MS 



Social Class 

Ancilllary attainment 
measure 



11 year data 

on ancilliary variables 2 4,5 3 / 

No 16 year data on ■ 

ancilliary variables 6 8,9 7 / 

Here + denotes the inolusion of the particular ancilliary variable and - the 
exclusion. The anci/iiary attainment measure includes the teacher ratings and 
also the general ability test at 11 years. Runs 4 and 5 differ by excluding 

, in run 5 the teacher rating at 11 years. Similarly for runs 8,9, Runs 6*-9 - 

: ! — ? 

. exclude all 16 year variables from the data and so exclude respectively social 

class and teacher rating; Social class; teacher rating at 16 yoars in runs 6) 8, 
*wl'7. Run 2 includes all ancilliary variables. 

The analysis of the means and of the regression coefficients is considered 
separately. 



The means of the .fA/<f£Key variables and 


the 7 year 


teacher rating are, given here 


for each of runs 1, la, 














^ests 




teacher rating*" 


Run No. Sample size 


7 year 


11 year 


16 year 


7 year 11 year 


1 ((*m/Ut U>t<) ?X£ ' 


28.8 


0.11 


• 0.11 


2.71 2.77 


la (""?) 500 


/ 


0.09 


0.09 


/ 2.80 ' 




27.5 


0.04 


0.07 


2.86 2.86 


All other runs 3-S gave 


estimates close 


to run 2. 




Expressing tho changes 


between 


runs relative to the 


estimated standard deviation 



in tho estimated dataset 2, these were in percentages. 



ERIC 



127 



ftm* 



- 10 



tests 

R"" T year 11 year 16 year 



teaclior rating 
7 year 11 year 



3,6 
7.2 

10.8 

o 



1 v la J 2.8 1.4 * / 

la v 2 / 5.5 9.3 / 

1 V 2i . / 3.1 8.3 10.7 21.2 

2* v ^""7 0.<T o x - J o 

The values in the row 2 v 3-9 are the maximum difference between any of runs 3-9 

and run 2 and shows that even using only the .11 year attainments measures as 

ancilliary variables (run 7) gives values very close to the estimates obtained 

i 

using all ancilliary variables (a maximum difference of 0.006, in the 16 year 
test, ever v found). In contrast the estimation using ancilliary data produced 
estimateSsubstantially dif f erent . (max 9.3% on 16 year test) from the "key" dataset 

and this in turn produced estimates differing in the same direction from, the 

x' * 

\ * r 

complete case datasets (maximum difference 3.6% in 11 year teacher rating). 



The estimates for the regression and reliability coefficents are. given in Table S. 

Percentage change between runs shows the following: 
% change. 

runs 1 v la 
la v 2 

1 v 2. 

2 v 3-9 



^(11,16) 


A (ll,16) 


r(ll) 








~ 7.5 


4.9 


2.8 


/ 


/ 


/ 


0.5 


1.3 


0.9 


/ 


/ 


/ 


8.0 


6.2 


1*9 ' 


1.2 


5.0 


5.8 


0.5 


1/2 


1.1 


0.7 


0.1 


'0.2 



Here for the regression coefficients the differences between the runs using the 
different ancilliary variables (max. 1-2% for ?. - ? ) are small in relation 

ttf-the (jifference between complete case" and"f ull use of ancilliary variables" 
datasets (l' v 2,. max difference 8.0% in ^(11,16). However, the estimate of 
reliability at 11 years shows comparable variation (1.1% v 1.9%). In contrast to 
the moans, these coefficients showed lnrfcor variation between the "complete case" 
and yjcey" data sets than between the key and corrected datasets, the lattpr 
difference being comparable to that between the corrected datasets. Notice that 
o information is available on the key datasets for the relation between 7 and 11 



year tests and for the 7 year variables as the 7. year variables are not included 
in the set of key variables. 

Table 3 shows that the runs excluding 16 year data give the same pattern to the 
estimates as thoso including 16 year data. However, systematic changes are found 
with inclusion of extra ancilliary variables with at ci c . extreme the inclusion 
of only social class at 11 years" (run 9) and of only social class at 11 years and 

ancilliary attainments at 11 years Xrun 8) producing no change from the "key" 

i * .* t 

dataset in the <11|16> and ^ <11,'16) respectively. Deletion of 1 of the 

ancilliary variables at each age, as in r uns 3> 4, produced small changes in the 

regression and reliability coefficeints a£ did deletion of variables measured at 

age 16 (run 6). 



Tabled Ef fect on regression and reliability coefficcnts of estimation using 

different combinations of ancill ary variables' ~ ~ 



Run No. 

1 (complete case) 


5(11,16) 
0.71p 


16) 
0.883 


<-(ll> 
0.804 


. ^(7,11) 
0,0590 


0.0811 


la (key) 


0.768 


0.929 t 


0.827 






2 (all ancill- 
iary) 


0.772 


0.941 ' 


0.820 


0.0597 


0.0772 


, 3 


0.774 


* *0.940 


0.S23 


0.0596 


0.0771 


4 


0.770 


0.93Q 


0.829 


0.0594 




5 


0.768 






0.0593 




6 - " 


0.772- 


0.936 


0.824 


0.0595 


0.0772 


7 


0.773 


0.936 


0.826 


0.0595 


0.0772 


8 


Q.771 


0.929 


0.829 


0,0593 




9 

CONCLUSIONS 


0.768 




*T 


0.0593 





r(7) ; 

0.728 

0.723 
6°. 773 



0.771 
0.771 



The estimates of instrumental regression and reliability cooff f icients given in ft-fA-l 
obtained on a dataset which involved deletion of casts on which tests or teacher 
ratings on a number of attainments and social class had missing values at any age. 
These included 27.5% of the total cases, a similar percentage to the 32. i% of the 
"complete case" data above. The present analyses suggest that the "complete case- 
data shows biases in relation to the dataset deleting cases onf f key variables, the 
"key" dataset, both in degression coefficients and reliability estimates 
and to a smaller extent in mean values. . fte 



analysis of non response of Goldstein (in Fogolmnn(1976> is on different data 
than the "complete case 11 data used in some analyses in Goldstein (1079) and on* 
analyses in <vt*/" . Furthermore these analyscs,in selecting non- responses on 
one test variable only, did not adequately reveal the effect of non- response. 
The interpolation procedure for missing values of Bcalc and Little allows an 
increase in the effective information utilised in tlie data. This gives further 
changes 1 the estimates usually in the same direction as between "complete caSc" 
and "key" datasets.. These are large in comparison to the prcvions difference, 
for the mean values and differ little when different sets of ancilliary variables 
are used. In contrast for the regression and reliability estimates further 
changes are- of the same order as that between "complete case" and "key" datasets, 
that for tho instrumental' variable estimate of regression of 16 year on 11 score 
being 1.3% and for the reliability of 11 years score being o <? %. In addition the 
use of diffeiont sets of ancilliary variables gave different estimates within 
roughly the same range. If the estimates obtained from the use of different 
subsets of ancilliary variables were reasonably constant we could be reasonably 
happy in concluding, given the range of variables used, that^ the total effect 
of "missingness" was largely taken up by these* variables. As it is, it is possible 
that further controls using extra ancilliary variables will produce further 
changes in the coefficients, perhaps of the order of 1-2%. 

The missing values interpolation program has been shown to be effective in 
utilising information on other relevant variables and the characteristics of the 
data are seen to effect the estimates, particularly of the "higher order" 
statistics such as regression and, reliability estimates. Unlike many other 
interpolation techniques (sec Marin^Olscn and Rubin, 1980) the Beale and\ittlc 
technique- produces unbiased estimates of these quantities when a certain « 
assumption about the missing values given the observed data, the "missing at, 
random" assumption, hold. The consistent change in coefficients as more and more 
data in tho study is utilised is seep as evidence that the correction of missing 
rallies using those and possibly other variables should be a prerequisite for further 



analyse;; with the data. Rcmilni#£ problems for study *nre the calculation 
of tho effective degrees of freedom of tho corrected data, the sampling 
variability of tho estimates produced and ihc utility of these estimates 
yrth\ data where the proportion of missing values is large <th<? present 'data 
had only 21% of the totarl data on the 10 variables Hissing). 



■A 



9 

ERIC 



131 



* 



nncnKssioy of attainment on school social mix within social class - ki-v ct 

, 0 7 KlfflOlt IN SOCIAL CUSS 

By Kuasoll Lcob 

Schooling tends to be competitive: r. in terms of measured 

. attainment for example, some pupils succeed while others fail. It could be 
argued* that our education system is geared to produce Just this result, but 
much educational research seeks tho social determinants of academic success 
as an aid to understanding inequalisty in society in broader terms, and also 
as. *a guide to educational policies that might reduce inequality. %< 

^The social class of a child's parents - usually measured by occupational 

categories by British studies - has long been acknowledged as one of the most 

important determinants .of later academic success. The 'social mix' of a 

school has been seen as of additional importance, particularly because it is 

amenable to change as a policy tobl - viz the interest in bussing of" pupils 

and the advocacy of socially mixed schools as opposed to community ,pr 

•neighbourhood schooling'; 

° A long-established .strategy (for redistribution of 

eductional resources) is through the socially mixed 
school where it is assumed that not only will children 
. from all social backgrounds have the same access to 
f resources but also, because of the presence of children 

who know how to demand, use and respond to resources 
effectively, those who would not otherwise do" so fully 
will come to do so, 

(Eggleston, 1977, p 61) 

In educational research, the framework for assessing the'effect on pupil 
attainment is expressed by tho linear model; \ 



•to 



Whoro y is pupil attainment, of ton a standardised test score; \^ 



* -is* the advantage to a pupil, of sociai class i; * 

9 

AX. is tlie advantage to all pupil? in a school due to its social 
mix x; and * 

£ is the residual attainment, assumed random. 

The estimated coefficients ( r ^i) represent an average social class effect within 

schools, Jhe estimnted^ represents the effect of school social mix, net of any 

differences due to p u pil social class . In practice, of course, the model is 4 * 

elaborated the addition of further controls at both pupil and school levels, 

such as the pupil's sex, family Aize, ethnic origin and previous performance, 

and school size, type and location. 

An early and influential report from the United States that examined this model 
found School social mix to be the most important single determinant of attainment 
"Schools are remarkably similar in the way they relate to the achievement of ^ 
their pupils when the socio-economic background of the students is taken into 
account" (Coleman et al, 1966, p 21). In Britain, Joan Barker* Lurin (1971) 
found a similar result; the Inner ^London Education Authority's 'Literacy Survey 
also found children of all social classes attaining better in the schools with 
more non-manual pupils in them* though the effect was greater for non-manual 
children than for working class children (Mabey, 1974); the Plowden Report on 
primary schooling (1967) advocated socially miKed schools as one method of 

- assisting disadvantaged children. A summary and discussion of research on 
-school social mix and pupil attainment is given by Simpson (198l) 

Social class is prone to measurement error. Most of the studies mentioned have 
relied on the pupil's teacher for an assessment of their parental occupation, an 
assessment which is known* not to be especially reliable. 

This article is concerned with the effect that measurement error in social class 
categories will have on the coefficients of model (1) above. In particular, 
the estimated arhool social mix effect is shown to he considerably inflated by 

- such measurement error » **% * 

9 - 

ERIC 

a^Man , * J' too 



UNIT OF AGGREGATIO N J-QK SOCIAL MIX 

f ! ' — 

The models we present are general in as much as the Social Mix is considered 

e 

in relation to any aggregate of individuals. This .an be either a cla&room and 
year group or *a whole school, and different choices may be more reasonable in 
different contexts. We will use the word 'group* in the following description. 
'It should be noted that in the example used later, from the ILKA Literacy Survey 
this denotes year group as information is only v available on year groups, not on 
individual classrooms. In the examples given in the introduction the unit 
of aggregation was the school. ' 

The unit Of analysis in 'each case is the individual, who is conceived [of as. 

« • ' « 

having three relevant items of information; attainment, social class ,and social 

o 

.mix of the group of which lie/she is a member 

A MODEL* WITH TWO SOCIAL CLASSES " . 

y 

We co.nsider first the simple model comprising two soci^ classes (1,2) with the 
same proportion in the population* We denote the true social class by S_ and 
the observed social class by S q and we suppose that the error of observation 
is such, that 'the conditional probability of observing the wrong social class is 
l> , independent of the true social class. " ' ' 

Thus Pfo-l/SO.) v t(S 0 ^ K>'J if >, 

We suppose also" that the relation of attainment to Social Mix is linear 
within each true social class, having the same slope for each individual 
social class and, that each group is of the" same size (n). For simplicity 
we use the total number, R, in the group who are in social class 1 as the 
independent variable to represent the social mix. 

We aim to express the observed slope in relation to the true slope and will 
expect the relationship to depend on three factors: 

(a) difference of intercept of the true regression lines, assumed parallel; 

(b) the distribution *of social class mix within each true social class/ and 
q their central tendencies; 



, - 4 - 

(c) The conditional misclassif icalion probability, p, which is assumed to be 
indpendent of the Social Class Mix (SCM) , and of attainment. 

Let us first make the additional assumption tbat there is no outside influence 
on class formation. V/e will later allow for outside influence. 

* 

« • 

If R = no of Social Class members in the group, then 

pf*") • . 

Given R= r, P (true Social Class 1 member is observed) = - 

no, 

Using the relationship, ? fai) ffa + r \ f r -i) = P (? T * I J/^. <> ) f (R ') 



Sirai 



Thus, the conditional distributions are identical, and binomial apart fron 

a«changc in central tendency*; the mean of S m = 1 being and of S = 2 

*■* T 2 /> T 



. distribution being 



n-1 



This analysis' will later be adapted to allow* for outside Influences on class 
formation by supposing the two distributions are as iollows 

Ht+<*« % 1* 0 - K"-,") (U P ' • °' 

t ( * }*\ '■'-) '• IV " r ''° 

where n ;> k }1.V Useful values ojr - may be estimated from the observed data 
making allowance kor the effect ci£ measurement error. The reasonableness of the 

. . . . 

binomial distributional assumption in this case will be tested on observed data. 

ERIC 



135 



true regression coefficient 

. 0 



Now lot the equations for the true social classes be 

lj - « % « j? x *t % * < • for Social Class 1 

J *' ' ■ for Social Class 2 

where ,2 
whore y ^ attainment (possibly corrected for intake)- 
and x = no, in Social Class 1 

and where the errors arc assumed to be identically and independently normally 
distributed. 

The slope of the relation of attainment (y) to Social Mi* (x) within observed 

r 

socihl class 1 is obtained by fitting a straight line to the following situation: 



attainment (y) 




observed with probability (l-(> ) 
observed with probability. f> 



?hcn /SUr * 



2 »• * r ~ 

4 £. *,j V.j P(~ii) 



'J-.J-L.jl 



where i denotes «oci:l class and \ is the number in !ho group in social class 
Vhcre/^J is the overall probability 6f observing x tJ and 
-;:horo Xj is the deviation fro;,, the overall wean , ^ . 



(or more generally , /a > 11* J H P ) 

if ^ is mean of social mix for Social Class l; A 4 
' *f yf t is mc »* of social mix for Social Class 2, <« ; ^ : 



(<•«• more ficnornlly, hfl 



136 



• 6 _ 



?HV 'Substituting for y ami summing the separate components of the quadratic 



jf&£ *ter&*we -obtain \ <M*r A>»~# */<- ja ^^ 



■ 'I 



/> (i —e) (-<. — ,«*J 



„ (i) 



fe?; ; 



< "H- 



let j be .defined by ^/^_*\ j 



(2) 



then when j = 1 the effect of changing 



social class is the same as the maximum effect of social mix within social class. 

• ' k 
Then for various valued of K = j and p we calculate the multiplying fraction, 



C, whore, ,^ lr - /? T , -f 

.if 

where;- approximately, J _? _ 
where k - 1 wt? get 



(' " k) 



" fir (l 



. and wlion k - ? 



KERXC , 

fcWv. : k,r. %. *u "... :• 



/•}•*! • 



(3) 



I -f- / ,.(>( i- fQ 
j 

137 



^ Values of f arc given in Table 1 for different values of P and K where n = 30. 
^' To obtain plausible values of p we use data on repeated observations of reported 
social class from the British Election Study end the Oxford Social Mobility Study. 
These studies are described in Appendix // and give respectively values of % 
of 0.02 and 0.0| . It is not known how the s ize o f error of measurement from 
pupi Is • reports compares with these but we would expect them to be at least as 
'high. 9 



FOR DIFFERENT VALUES OF 



.1 ' 










0.1 


ru 

0.2 


0 


.5 


, p 


=-0,.02 


J 


0.2 


1.39 


2.27 


3.63 


6 


4 








0.5 


1.16 


1.57 


2.05 


3 


16 








1» 


1.08 

> 


1.25 


1.53 


\" 2 


08 


. p 


= 0*05 ' 


J 


0.2 


1.95^ > 


4.28 


7,0 


8. 


59 








0.5 


1.38 


2.31 


• 3.4 


' 4. 


03 








1 


1.19 * 


1.66 


2.20 


2. 


52 



EXTENSIONS OF THE MODEL TO SOCIAL CLASSES WITH DIFFERING PROPORTIONS 

Let q = the overall proportion of true social class 1 

The Conditional Binomial, dis tributions now become ' 

[yJ i M""" - fit- ••/*"•; 

This difference only enters in one term in equation (3) the value of f becoming 

There is little overall effect on^f of differences in q in the range .4^q <. 0.6. 

More extreme value? within range 0..3 < q ^ 0.7 have little effect also when 

lues of p in the rango given though when k = 0.1 or less 



k = 0.5 for the va 
the term involving 



JltVrq) dominates the denominator. Wo illustrate in 



yy:^< ' ' -8 - 

* *Tablo 2 tho effect on X of varying q when tlioro is no outside 
*\ influence on class formation (k""^ ). The value of 0.2 is used for j this 
r", not effecting the relative influence of variation in q. 



II K TABLE 2 [ VAIjUES^OfV * /^fr/f T ,F0R DIFFERENT VALUES OF <\ ,j? 



sT ' •. 

_ % -s ^ 






0 


1 


0.2 


0.3 


0.4 0.5 


q 


J" ' v 

j£: " p = 0.02 






2 


12 


1.63 


1.48 - 


1.42 1.40 










3. 


68 


2.52 


2.16 


2.0^^^1793" 





For more than two social classes the aoove analysis can be adapted, taking 

social classes in pairs and redefining the Social Mix to be the relative number 

in each class. Tne analysis only stricly applies however when the sum of individuals 

in the two social classes under consideration is constant in every group and it ' 

does not t'ake account of the influence of social classes not included. 



APPLICATION TO DATA FROM THE ILEA LITERARY SURVEY 

Data on attainment 6j all 17564 pupils aged 8 in 503 ILEA schools in 1968 
were used. 

The value of ^ = n^/fo 

is not directly calculable as it depends on the unknown parameters / r *<\_ 
of the true regression lines. However, and are found to be 

, related by the Equation 

o 

fir * -/fu, - (« ~ p ( '• r) - 

As n ? the size of the year group is not constant, the average value is 

used. The social class division is that between manual and non-manual. 

Ir|ERJC ion 



.9 - 



, -The following values of the parameters wore found: 



4V 



n =~ 43/21 



k - 6/708 



q = 0,234 



Prom Equation 4 we obtain for, for ^ - o 



and 



for 



O - Of 



f?;- v "l3o, using the conservative estimate of 0,02 for p a reduction of 42% of /$ 

from its former value is obtained, the influence of social mix changing sign between 

* <y*> 

| y. this and the higher estimat3 of ^measurement error. 



MERLC 



140 



> REFERENCES 

Darker Lunn, J (1971) Social Class, Attitudes and achievement. Slouch, NFER, 

Coleman J et al (196G) Equality of Educational Opportunity. Washington DC. 
, *US Department of Health, Education and Welfare* "~ 

Department of Education and Science (1967). Children and their primary schools 

London. .MIISO. 

Eggloston, J (1977) Ecology of School. London, Mcthuen. 

Mabcy, C (1974) Social and ethnic mix and the relationship with attainment of 

children ages 8 and 11. London, Centre for Environmental Studies. 

' « • 1 

* Simpson, L (1981) Statistical assessment of school effects using educational 
survey data. Doctoral thesis to be presents to the University of London. 



? 



\ 



APPENDIX 10 



A DESCRIPTION OF TWO DATASKTS USED .TO ESTIM ATE MEASUREMENT liRROR I If SOCI AL 
CLASS 1 * By Russell Ecob 



in soci 
I—* — p 
-/ 



1. THE BRITISH ELECTION STUDY (BES) 

Reiifterview data gathered eight > months apart was obtained from the 
British Election Study of the University of. Essex, directed by i 
Professor Bo Sarlvik and Ivor .Crewe and conducted in February and 

/ 

October 1974, eadh interview following a General Election. Out of a 
sample of 1830 interviewed at both times, 1656 individuals were 
selected where the respondent was employed at bo th.' times. The analysis 
was on the 1097 reporting their own occupation eliminating those wives , 
reporting their husband's occupation. The data /was taken from an 
analysis by Fox and Alt/1976). Detailed job descriptions were obtained 

*■ from the same person in both surveys and these were allocated to 

Occupational Unit Groups (OUG's). For theyOUG's which were different 

* at each occasion, a distinction was made bptween "genuine" and "spurious" 
change. The genuine changes in OUG are t/ose believed to be caused by 
a genuine change in job, the spurious c/ianges being those which, on 
^examination of all occupation-related material, were believed to be caused 
by a description of the same job in a /different way on each occasion. 
In addition, some changes in OUG are/caused by coder error on either 
occasion. The reliability of social class coding is investigated on 
the subsample formed by eliminating those " genuine" 0UC- changes constituting 
3.4% of the total sample which/cause a change in social class. 



S^Ul 



2 THE OXFORD SOCIAL M OBILITY STUDY (QSMS) 

The Oxford Social Mobility Study consists of a national survey in 1972 of 
10309 men aged between 20 and 64, resident in England and Wales who were 
asked about the own and thoir father's education, their present occupation 



and their father's occupation when they were 14. Two years later 
a reliability study was undertaken (Hope, Graham and Schwnr?. , 1979) 
which involved rc-intorviowing a representative 10% of* the sample. 
The present data given in Table 2 comprises those 565 subjects who, 
when rc-intcrvicwcd in 1974, maintained that their occupation was 
the some as that in 1972. , This subsample has been shown to bo 
representative of the complete sramplc. In terms of the six Registrar 
General's Classes, 28% of the subjects showed a change in Social Class 
, in this period and for the aggregation into three classes given here, 
the figure was 10.3%. 

Tho following breakdown into manual and non-manual social classes was 

found at ; each lntcrview*occasion for the two. studied: . , 

* <nm 1 to » * ; run, \ JE 

BES * nm 553 19 OS MS nm 164 28 

rn 21 441 m 18 355 

Assuming that the conditional mtsclassif ication probabilities (P) are 
identical for both classes, we obtain a value of 0.0195 for the BES study 
and a value of 0.042 for the 0SMS study. 



143 



